=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= От: Gregory J Chaitin Кому: Wablenica Написано: 16 декабря 2003 г., 23:35:15 Тема: Algorithmic complexity in biological systems? Файлы: message.htm --====----====----====----====----====----====----====----====----====----===-- I don't want to analyze your e-mail in detail; that would take too long. However, I would tend to agree with the general thrust of your e-mail, which is that in the areas that you are interested in program-size is not a useful concept. Indeed, I believe that program-size complexity is of greatest value in connection with Godel's incompleteness theorem on the limits of formal axiomatic mathematical theories and related epistemological questions. And, as I always say, my theory is totally impractical: Indeed, the theory's most important result is to show that it is totally impractical because we can never calculate the program-size complexity of an individual object, and this shows that there are serious limits to what mathematics can enable us to know. So, in summary, I agree with you not in all details, but certainly in your general conclusion that my theory is not useful in the domains in which you work. However, my theory was never intended to be a practical useful theory. It's goals, at least the way I develop the theory, have always been extremely theoretical and philosophical, not usefulness. Let me leave you with one thought regarding the level of abstraction of my theory. My viewpoint is not really so un-biological, because the basic view of biology is that DNA is software for constructing and running an organism. My theory abstracts this: DNA becomes a computer program, and the organism is modeled as the output of the program, as what it computes. Then my complexity measure, the size of the program in bits, is just about the same as the kilo or mega-bases of DNA used as a measure of DNA complexity. Of course you will tell me that DNA has 9 months (in humans) to produce an organism, and that DNA is highly redundant, for example, because some genes are repeated. Yes, yes, I agree, my theoretical model is a toy model, it is much simpler than the real world. But that is the price you have to pay to be able to prove theorems: You need to deal with an ideal, simplified situation, not with all the complexities of the real world. So forget about my theory, please! Best wishes, GJC Wablenica 12/16/2003 01:16 PM Please respond to Wablenica To: Gregory J Chaitin/Watson/IBM@IBMUS cc: Subject: Algorithmic complexity in biological systems? Dear professor Chaitin: I'm Constantine Chmielnicki, a biochemist from Moscow, Russia. I'm very interested in the problem of complexity of biological systems. Learning the materials on this issue I've seen that some researchers try to use the concept of algorithmic complexity. However, in my view, this concept adds more confusion than takes off. I couldn't get online and offline the satisfactory introduction into the issue, so I dare apply to you as a co-founder of the concept with two questions. The first questions deals with the adequacy of usage of the term "random" w.r.t. complex strings; the second one deals with the vagueness of application of the concept to the exploring and modeling the biochemical systems. 1. Kolmogorov and you label complex, non-compressible strings as "random". You exemplify such random strings with the "canonical" tossing of a dice. I think that some gap between the probabilistic and algorithmic concepts of "randomness" occurs. _Every_ string that arises from random tossing is random, even that one which yields "6666666". There is a rather high probability of such result. On the other hand, many strings that represent natural or programming language sentences are non-random and non-compressible at the same time. If we have a look at the genetic texts, it is common knowledge that supposedly meaningless non-protein coding part of human genome consists of thousands of long and short repeats, and is far better compressible than coding genome. Moreover non-coding parts of genes, introns, are equally or better compressed than coding parts, exons. Can we call thus coding genome "random", "chaotic" and non-coding "non-random", "orderly"? If we can/should, than all the concept of randomness becomes meaningless. I think that what we see in dice-tossing strings as well as linguistic / genetic text, is regularity, that enable compressing algorithms like LZW to decrease their size. However (un)regularity does not coincide (though greatly overlap) with (non-)randomness or (dis)order. You write: "In summary, the value of a scientific theory is that it enables one to compress many observations into a few theoretical hypotheses. There is a theory only when the string of observations is not random, that is to say, when its complexity is appreciably less than its length in bits. In this case the scientist can communicate his observations to a colleague much more economically than by just transmitting the string of observations. He does this by sending his colleague the program that is his theory, and this program must have much fewer bits than the original string of observations." This is not clear for me from the practical view. Suppose we have a task to explore the blood clotting system. The starting observation string was: "blood plasma forms fibrous clots in vitro upon addition of the excess of calcium ions". And the intermediate "theory string" is presently a long text with a list of a dozen of coagulation factors, with their primary, secondary, and tertiary structures, and the list of the cascade of reactions, with positive/negative feedbacks, etc. So we see that the theory is longer than the data :-). Of course, I omitted the intermediate steps where the researchers acted on the plasma with different chemical, physical agents, gathered the info of patients with congenital disorders in blood clotting, etc. However where is the "original data string", and where is the "(un)compressed theory string"? Let us consider the imaginary ideal first couple of explorative steps. Suppose we have a wonder-gadget, enabling us to momentary scan the volume of blood plasma and record the type and position of all the molecules in it. So we have this original data string as a giant succession of type+position doublets, most probably uncompressible "dynamic chaos". Our "theory" is really "as is" on this step. On the next step we arbitrarily assume that the spacial position of molecules is not important. So we delete positions and save molecule names. This would yield a smaller file with 95% of the components being H20, water. Of course we can compress such string approx. 10-fold. So is this a "theory" already? I guess not. A real theory arises from multiple experiments when we block this or that component and see what'll it be in the end. As a result we get the graph of a dozen of coagulation factors as nodes and activation /inhibition reactions as edges. The most compact representation of a graph is, afaik, a list of edges (granting that there are no isolated nodes in the graph). So we can say that this list is a "compressed" theory of blood clot formation, can't we? However can the process of building a theory of blood clotting be described in terms of algorithmic complexity? Hardly. Perhaps only when we start from the string containing the full list of blood ingredients and then erase the ingredients irrelevant to clotting. But after we have modeled the system of coagulation the main problem arises. Suppose we assign the 13 coagulation factors the identifiers, 0,1,2...,9,A,B,C. Suppose we represent the edges in a graph with a list of the format "ActorNode PatientNode2 PatientNode3, .... So we have the resulting "theory string" like this: "CB)BA)A9)98)8A)A5)52)2578BD1)7A)D1". But the most interesting is that some researchers claim that the strings such as above can be analyzed with the concept of algorithmic complexity! Of course, most such strings will be "complex" and consequently "random". But this is nonsense! One of my opponents introduces the "complexity of a system": "Complexity of a system is defined in ATP as the minimum size of a program (or of an algorithm) that is capable of encoding the system. Moreover, he claims that "if the size of the minimal encoding program cannot be reduced below the size of the system itself, it is irreducible", and further: "... a very important consequence of the basic theorems of ATP is as follows: if a system is indeed irreducibly complex, it is necessarily random" I think that this is a real misuse of the concept of algorithmic complexity! Or, in the worst case, an example of some inborn defects in it reduced ad absurdum. Here the mythical "size of the system" is compared to the "minimal encoding program". The absurdity of this corollary is evident if we substitute, say, "bear" instead of "system": "if the size of the program encoding bear cannot be reduced below the size of the bear itself, it is irreducible", and with further substitution: "if algorithmic complexity of a bear equals to the size of bear, its complexity is irreducible (and random!)". However, if a bear has multiple identical components (as water, that is 80% of the bear's content), it is less random - but anyhow more chaotic than a pure water! To sum up, I believe that a) the complexity of a system, at least a system in the sense of "a set of elements with an integrative property that is missing in every element or is not a sum or mean of the properties of all the elements" - is a concept different from the algorithmic complexity, at least in the vagueness of the "original data string" (although both concepts may yield the same results, roughly, the sum of nodes and edges of a system graph); b) the randomness of a string in the sense of its compressibility is not a very good tag, at least not coinciding with usual notion of randomness in the theory of probability. By the way, I found similar doubts in the book of Nicolis & Prigogine, who claim that the real complexity of a system is something intermediate between the dynamic chaos of a solution and a still order of a crystal. I read also the viewpoints of some physicist who claimed that the problem of filtering the noise from the signal was not solved (or even tackled) in algorithmic probability theory, so when "measuring the complexity of a system" we actually measure the "complexity" of a noise. As for me, I consider the dynamic chaos of a gas or solution irrelevant to the notion of simple-complex, in the system analysis. It is irrelevant as long as the order or orientation of molecules is irrelevant. I would be very obliged to you if you kindly show me where I?m wrong or where my opponent is wrong. Thank you for your attention and patience, Best wishes, Constantine. : =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= От: Gregory J Chaitin Кому: Wablenica Написано: 17 декабря 2003 г., 0:46:06 Тема: Fw: Algorithmic complexity in biological systems? Файлы: message.htm --====----====----====----====----====----====----====----====----====----===-- Another way to put it, is that my theory is an existence proof that a mathematical theory of complexity is possible, at least in the domain of meta-mathematics. However, other, more practical fields will require different kinds of complexity concepts... =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= От: Gregory J Chaitin Кому: Wablenica Написано: 17 декабря 2003 г., 1:15:32 Тема: Algorithmic complexity in biological systems? Файлы: message.htm --====----====----====----====----====----====----====----====----====----===-- If you do send copies of my e-mail to other people, I would appreciate it if you could send it to them complete, and do not just quote limited parts of it. Thank you! Wablenica 12/16/2003 04:50 PM Please respond to Wablenica To: Gregory J Chaitin/Watson/IBM@IBMUS cc: Subject: Re[2]: Algorithmic complexity in biological systems? Thank you, professor Chaitin, for your answer. <...> Again, thank you very much. Constantine. P.S. May I cite your answer in my polemics with opponents?