Kiến trúc top-down và bottom-up trong trí tuệ nhân tạo

Nhân lục lọi đôi thứ trong archive tôi tìm lại bài tranh luận nhỏ cách đây đúng 5 năm trên VNAI mailing list (của những người bạn VN quan tâm đến AI). Bài này bắt nguồn từ email sau đây của bạn Đặng Việt Dũng về một bài giảng về reactive agent architecture của Rodney Brooks:

On Thu, 4 Jul 2002, Viet Dung Dang wrote:

Cách đây mấy tháng, em có dự 1 bài giảng của Rodney Brooks để quảng cáo cho quyển sách mới của ông tên là “Flesh and Machines”. Như nhiều anh chị đã biết, Rodney Brooks vào năm 1994 đã gây một cơn sốc lớn trong giới nghiên cứu agent, khi ông phủ nhận kiến thức truyền thống (logic-based với knowledge based inference rules) và khai sinh ra kiến trúc Subsumption, mà cốt lõi là purely reactive.
Trong bài giảng em đã được dự, Brooks tiếp tục promote kiến trúc subsumption và nói về việc xây dựng một con humanoid robot dựa trên kiến trúc đó. Một câu hỏi mà chắc mọi người cùng thắc mắc, và em cũng vẫn thắc mắc sau khi dự là theo kiến trúc của Brook, agent hoàn toàn không có internalrepresentation of the world, mà hầu hết chỉ dựa trên các re-active rules. Trong buổi đó, khi trả lời câu hỏi, Brook có nói là nếu xây được một representation như thế thì sai lầm vì model đó sẽ không phản ánh đúng thực chất của thế giới outside.
Tuy nhiên, theo như em hiểu, thực chất con người chúng ta đều có một representation of the world ở trong đầu, và dựa vào đó để suy luận và hành động. Như thế, liệu kiến trúc của Brooks co’ phải là useless khi xây dựng humanoid robot không?
Em cũng nghĩ chắc là không, vì nếu thế thì làm sao Brooks lại đang làm director lab của MIT được ?

Bài bình luận của tôi không đi vào chi tiết về kiến trúc của Brooks, mà chủ yếu tóm tắt một số sự khác biệt giữa hai truờng phái kiến trúc khác nhau trong AI. Trường phái truyền thống có thể nói bắt nguồn từ John McCarthy ở Stanford, người mà chúng ta đã nói qua ở blog này về Common sense và AI.

On Thu, 4 Jul 2002, XuanLong Nguyen wrote:

This is an interesting question that every student of AI should ask himself at some point. It seems that most people would give up after a few thoughts. Those with whom the question stuck still would probably eventually end up in places such as the directorship of MIT AI lab :-). In the mean time, the rest of us are happily working on obscure problems of knowledge-based systems, Strips worlds, Bayesian nets that seem to be forever in the wrong side of the real world ? Still, I think it’s fun to talk about — and before the other more knowledgeable members of the list speak out, hopefully — I’d like to add my humble opinions.
Without getting into hairy discussion of intelligence, and AI and so on, lets call the Brooks approach the bottom-up (or the MIT school), while the traditional approach the top-down (or the Stanford school). Roughly speaking, the most distinctive constrast between the two is that the bottom-up approach is behavior-based, while the top-down is representation-based.
Both approaches bear merits as well as difficulties. For a researcher in AI, the comparison also depends on what your immediate objectives are. If the objective is to build an *autonomously* “intelligent” entity that appears to be interesting and capable of doing something deemed interesting, the answer is not clear. The problem with the traditional top-down approach is that in order to break down a system into modules such as knowledge representation, reasoning, planning, learning, multi-agent coordinating, etc one has to necessarily make a lot of simplifying assumptions. Therefore there is no guarantee that such simplification is harmless. Since we still don’t know what it takes build an intelligent entity, whatever that means, we don’t know if such simplifying assumptions help us reach the essence of intelligence faster, or they simply lead us astray to a dead-end path that would preclude the possibility of ever finding any solution.
So this has been one of the strongest criticism of the traditional top-down approach. And for good reasons. The clearest evidence is that after half a century of research in many areas and subareas and subsubareas and subsubsubareas of AI, the field as a whole hasn’t achieved much, and most of us seem to forget the grandiose objective of the field, i.e., to understand and create *autonomous* intelligent entities that can hold up themselves in the real world. One natural alternative is, since we don’t know what it takes to build intelligent entities, we need to imitate what are considered to be intelligent. There are plenty around, not just in humans, but also animals and insects. So it seems plausible to step back to square one and studied the most primitive mechanism there is in the nature. The good news is that by now many researchers are convinced that seemingly sophisticated behaviors can be built up by very simple (and possibly reactive) rules. It is also hoped that one is able to understand and build progressively more sophisticated and intelligent behavior-based entities in much the same way the evolution works.
Needless to say, the progress has been frustratingly slow, and our understanding of nature, including even some of the most primitive insects, remains very limited. It seems that the breakthrough in AI would probably come from this bottom-up approach, given its very interdisciplinary nature and possible contributions coming from computer science and statistics, as well as other computational sciences and experimental sciences (such as neuroscience, physics and biology).
There have been a very strong movement that is more or less advocating small autonomous robots equipped with limited sensory and actuating capabilities but which are nevertheless useful and can hold up in the real world environment. They go by names such as x-insects, smart dust, robotic fly, etc. One distinctive feature of the bottom-up approach is the emphasis on sensors and sensory data processing (interaction with outside world). By contrast, the top-down approach focuses on representing and manipulating abstract knowledge-base with a strong emphasis on methods for obtaining such knowledge in the first place.
Back to the traditional top-down school. It is not without success, though in a limited way. By abstracting away and breaking down the intelligency into many different disconnected parts such as knowledge representation, reasoning, planning, learning, etc these subfields have been reduced to subclasses of problems, whose techniques prove useful for problem-solving, usually with heavy intervention of humans. Within each of the modules, limited intelligent behavior can be achieved. As we all know, an AI search researcher can boast about Deep Blues, while machine learning and control theorist can talk about autonomous vehicles. Planning researchers talk about automatically controlling a spacecraft for a short period of time. More robust, user-friendly softwares in many applications have been incorporating latest AI research advances without notice.
While both aiming for intelligency, the traditional school has an objective very different from the new bottom-up approach. This is exemplified by a popular AI textbook (by Russell and Norvig) which simply proclaims in chapter 1 that it is more concerned about solving problems intelligently, but not to imitate human intelligence in the first place. Perhaps the most convincing argument of
the traditional school of AI is summed up by Drew McDermott (from Yale), who famously wrote that: To say the Deep Blues is not intelligent is to say that airplanes don’t fly because they don’t flip their wings. However, the price this approach has to pay is that by separating the subfields too far apart, it is very hard to glue them back together to build a coherent working autonomous system. In particular, such systems are hopeless in interacting with the outside world.
Nevertheless, the traditional school of AI will be here to stay, and so will the AI planning, knowledge-representation, machine learning, multi-agent researchers :-). Ultimately, in order to have a truly intelligent autonomous entity (whatever it means, again), one has to have knowledge-representation capability, as well as search, inference, extra/interpolation capability, and so on because the most sophisticated intelligent entity, i.e, human, surely have those. But it is not clear if the knowledge representation and reasoning mechanism of that dream intelligent entity is the same as what we know, or are heading for in our search. Neither do we know how they are glued together.
Ideally, one may hope that both approaches in AI will meet somewhere in the search graph, or they never will if the graph is infinite with loops. We can only wait for many years to come to know the answer. And in the mean time, it’s useful to prove the NP-hardness of whatever task there is in each of our favorite building blocks of what are known to us as AI ?
cheers,
Long [July 04, 2002]

Tóm lại của bài viết dài dòng trên, chúng ta có ba câu hỏi chính mà cả hai anh kiến trúc đều phải đối đầu:

Làm thế nào để thu nhập, cập nhật được kiến thức?
Làm thế nào để mã hóa chúng một cách gọn gàng để có thể truy cập và sử dụng một cách hiệu quả?
Và làm thế nào sử dụng được kiến thức một cách hữu hiệu nhất?

Đó là những câu hỏi lớn không chỉ trong trí tuệ nhân tạo, mà trong công việc và cuộc sống hàng ngày chúng ta luôn phải đối thoải, chẳng hạn như bài blog gần đây của anh Hưng. Trí tuệ nhân tạo là một ngành có tham vọng “nhân tạo” và tự động hóa những giải pháp cho các câu hỏi trên, qua công cụ thuật toán, và các kỹ thuật phần mềm và phần cứng.

5 năm sau, suy nghĩ chung của tôi về vấn đề kiến trúc topdown vs bottom up không thay đổi nhiều. Mặc dù tôi vẫn thiên về kiến trúc topdown hơn, nhưng sự nhận thưc về mặt mạnh và mắt yếu của cả hai kiến trúc đều không thay đổi. Có thể nói trường phái top-down phân tách cả ba câu hỏi một cách tách bạch, và từ đó AI bị chia ra làm nhiều ngả, mỗi ngả tìm cách giải quyết một câu, và tập trung chủ yếu vào câu thứ hai và thứ ba. Còn kiến trúc bottom-up thì tập trung vào câu thứ nhất. Cả hai loại kiến trúc đều rất có ích, nhưng có lẽ một kiến trúc thích hợp nhất sẽ không có sự chia rẽ ba vấn đề trên một cách tách bạch như thế.

Điều đáng nói là ba câu hỏi trên có tính phổ quát không chỉ trong TTNT mà rất nhiều lĩnh vực khác, đặc biệt trong thống kê hiện đại. Ngành thống kê hiện đại vật vã rất nhiều với câu hỏi: Làm thế nào make sense được với data. Data chính là giao diện của bạn, của tôi, của một chính thể trí tuệ nhân tạo trong tương lai, với thế giới bên ngoài. Ngành thống kê cũng phải đối đầu với những câu hỏi như: Khi phải xây dựng một mô hình về thế giới, chúng ta phải bắt đầu từ đâu, bao nhiều prior information thì đủ? Qua đó chúng ta đi đến sự đối đầu giữa Bayesian và frequentists. Hay, làm thế nào để tạo ra các mô hình phức tạp một cách hệ thống? Graphical models, một sự kết hợp của graph theory và probability theory, chính là một giải pháp hướng tới câu trả lời đó. V.v. và v.v. Sự hội tụ giữa TTNT và ngành thống kê góp phần làm ra đời và phát triển lĩnh vực machine learning, và theo chủ quan của tôi, rất có thể trong tương lai không xa machine learning sẽ trở thành một trong những lĩnh vực trung tâm của cả TTNT và thống kê.

Thay cho câu kết, tôi xin trích lại đoạn sau trong Common sense và AI.:

Câu hỏi quan trọng mà tôi quan tâm là: Common sense có phải là khái niệm hữu ích hay không trong việc xây dựng máy tính thông minh? Cụ thể hơn, đó có phải là một khái niệm constructive hay không. Khái niệm này có thể được mổ xẻ và tổng hợp một cách tự động từ giao tiếp của computers với thế giới bên ngoài? Khái niệm đó có thể dễ dàng transfer từ computers này sang computers khác, có thể được tổng quát hoá và suy diễn (inductive and deductive inference) để tạo ra những khái niệm mới có ích?
….
Tôi cho rằng, nhìn từ góc độ mô hình xác suất, rất có thể common sense là một dạng “emerging property/phenomenon”, song không nhất thiết là một khái niệm kiến trúc căn bản của intelligence. Nếu quả thật như thế, thì rất nguy hiểm khi bắt đầu xây dựng một intelligent system bằng cách xây dựng common sense storages. Những systems như vậy sẽ không robust.

Tôi cho rằng vấn đề top-down vs. bottom-up về cơ bản là một trade-off giữa scalability và convergence time, theo nghĩa sau đây.

1 Trong kinh tế: kinh tế tập trung (i.e. top-down) giải quyết được một vấn đề nhỏ và cụ thể trong thời gian ngắn (ví dụ như Castro phủ Cuba bằng các cánh đồng mía ? trong vài năm), nhưng kinh tế thị trường (i.e. bottom-up) mới có đủ scalability để giải quyết các bài toán kinh tế xã hội mà objective functions phức tạp, dễ bị curse of dimensionality. Nhưng để phát triển một nền kinh tế thị trường đến tuổi trưởng thành cần đầu tư thời gian lớn.

2 Trong networking (và distributed systems): hai trường phái thiết kế mạng cơ bản là “smart network” (top-down) và “dumb network” (bottom-up). Internet là một thực thể phức tạp được thiết kế với nguyên tắc end-to-end (dumb network). Sự thành công của Internet mấy mươi năm qua là minh chứng hùng hồn của tính scalability của giải pháp bottom-up. Ngược lại, còn rất nhiều vấn đề với design hiện tại, và còn rất lâu Internet mới tiến hóa đến mức làm chúng ta thỏa mãn.

3 Trong phát triển nhân lực: muốn đào tạo “con người mới XHCN” trong 5, 10 năm thì dùng top-down approach, muốn đào tạo các cá nhân sáng tạo đủ trình độ đáp ứng với thời đại thì phải cho tự do tí toáy mất nhiều thời gian hơn.

4 Ta có thể tìm ti tỉ các ví dụ tương tự khác với các scales khác nhau trong AI, chính trị, đời sống, yêu đương (các bác nghĩ một chốc sẽ ra analogy hay ? ).

Tóm lại: khi chưa có tenure thì bác Long cần theo top-down để xây dựng credential nhanh chóng, có tenure rồi thì sang làm bottom-up như Brooks để thành giám đốc MIT AI Lab :-).

Interesting examples! Nhưng có lẽ khác với các lĩnh vực cụ thể trên, trong AI thế nào là kiến trúc bottom-up còn rất mơ hồ. Sự khó khăn này chắc là do tính đòi hỏi sự khái quát trong set-up của AI. Kiến trúc của Brooks còn quá giản đơn và hình như mấy con humanoids vẫn chưa làm gì ngoài mấy cái xác như trong sci-fi. Có lẽ bức tranh về kiến trúc bottom-up mới chỉ dừng lại ở mức độ “a new kind of science” của Wolfram (dẫu cái này cũng chưa có gì lthật à new hay deep gi` cả).

Tất cả những gì phần lớn chúng ta làm vẫn là những hòn gạch nhỏ trên tường mà thôi. ?

http://www.procul.org/blog/2007/06/25/ki%e1%ba%bfn-truc-top-down-va-bottom-up-trong-tri-tu%e1%bb%87-nhan-t%e1%ba%a1o/