=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
От:       Gregory J Chaitin <chaitin@xx.xxx.com>
Кому:     Wablenica <wablenica@mail.ru>
Написано: 16 декабря 2003 г., 23:35:15
Тема:     Algorithmic complexity in biological systems?
Файлы:    message.htm
--====----====----====----====----====----====----====----====----====----===--
I don't want to analyze your e-mail in detail; that would take too long.
However, I would tend to agree with the general thrust of your e-mail,
which is that in the areas that you are interested in program-size is
not a useful concept.  Indeed, I believe that program-size complexity
is of greatest value in connection with Godel's incompleteness theorem
on the limits of formal axiomatic mathematical theories and related
epistemological questions.  And, as I always say, my theory is totally
impractical:  Indeed, the theory's most important result is to show that 
it is
totally impractical because we can never calculate the program-size
complexity of an individual object, and this shows that there are
serious limits to what mathematics can enable us to know.

So, in summary, I agree with you not in all details, but certainly in your
general conclusion that my theory is not useful in the domains in which
you work.  However, my theory was never intended to be a practical
useful theory.  It's goals, at least the way I develop the theory, have
always been extremely theoretical and philosophical, not usefulness.

Let me leave you with one thought regarding the level of abstraction
of my theory.  My viewpoint is not really so un-biological, because
the basic view of biology is that DNA is software for constructing and
running an organism.  My theory abstracts this: DNA becomes a
computer program, and the organism is modeled as the output of
the program, as what it computes.  Then my complexity measure,
the size of the program in bits, is just about the same as the kilo or
mega-bases of DNA used as a measure of DNA complexity. Of
course you will tell me that DNA has 9 months (in humans) to produce
an organism, and that DNA is highly redundant, for example, because
some genes are repeated.  Yes, yes, I agree, my theoretical model
is a toy model, it is much simpler than the real world.  But that is the
price you have to pay to be able to prove theorems:  You need to deal
with an ideal, simplified situation, not with all the complexities of the 
real world.

So forget about my theory, please!

Best wishes,

GJC
 

Wablenica <wablenica@mail.ru>
12/16/2003 01:16 PM
Please respond to Wablenica
 
        To:     Gregory J Chaitin/Watson/IBM@IBMUS
        cc: 
        Subject:        Algorithmic complexity in biological systems?


Dear professor Chaitin:

I'm Constantine Chmielnicki, a biochemist from Moscow, Russia. I'm
very interested in the problem of complexity of biological systems.
Learning the materials on this issue I've seen that some researchers
try to use the concept of algorithmic complexity. However, in my view,
this concept adds more confusion than takes off. I couldn't get online
and offline the satisfactory introduction into the issue, so I dare
apply to you as a co-founder of the concept with two questions.

The first questions deals with the adequacy of usage of the term
"random" w.r.t. complex strings; the second one deals with the
vagueness of application of the concept to the exploring and modeling
the biochemical systems. 

1. Kolmogorov and you label complex, non-compressible strings as
"random". You exemplify such random strings with the "canonical"
tossing of a dice. I think that some gap between the probabilistic and
algorithmic concepts of "randomness" occurs. _Every_ string that
arises from random tossing is random, even that one which yields
"6666666". There is a rather high probability of such result. On the
other hand, many strings that represent natural or programming
language sentences are non-random and non-compressible at the same time.

If we have a look at the genetic texts, it is common knowledge that
supposedly meaningless non-protein coding part of human genome
consists of thousands of long and short repeats, and is far better
compressible than coding genome. Moreover non-coding parts of genes,
introns, are equally or better compressed than coding parts, exons.
Can we call thus coding genome "random", "chaotic" and non-coding
"non-random", "orderly"? If we can/should, than all the concept of
randomness becomes meaningless.

I think that what we see in dice-tossing strings as well as linguistic
/ genetic text, is regularity, that enable compressing algorithms like
LZW to decrease their size. However (un)regularity does not coincide
(though greatly overlap) with (non-)randomness or (dis)order.

You write: "In summary, the value of a scientific theory is that it
enables one to compress many observations into a few theoretical
hypotheses. There is a theory only when the string of observations is
not random, that is to say, when its complexity is appreciably less
than its length in bits. In this case the scientist can communicate
his observations to a colleague much more economically than by just
transmitting the string of observations. He does this by sending his
colleague the program that is his theory, and this program must have
much fewer bits than the original string of observations."

This is not clear for me from the practical view. Suppose we have a
task to explore the blood clotting system. The starting observation
string was: "blood plasma forms fibrous clots in vitro upon addition
of the excess of calcium ions". And the intermediate "theory string"
is presently a long text with a list of a dozen of coagulation
factors, with their primary, secondary, and tertiary structures, and
the list of the cascade of reactions, with positive/negative
feedbacks, etc. So we see that the theory is longer than the data :-).
 
Of course, I omitted the intermediate steps where the researchers
acted on the plasma with different chemical, physical agents, gathered
the info of patients with congenital disorders in blood clotting, etc.
However where is the "original data string", and where is the
"(un)compressed theory string"?

Let us consider the imaginary ideal first couple of explorative steps.
Suppose we have a wonder-gadget, enabling us to momentary scan the
volume of blood plasma and record the type and position of all the
molecules in it. So we have this original data string as a giant
succession of type+position doublets, most probably uncompressible
"dynamic chaos". Our "theory" is really "as is" on this step. On the
next step we arbitrarily assume that the spacial position of molecules
is not important. So we delete positions and save molecule names. This
would yield a smaller file with 95% of the components being H20,
water. Of course we can compress such string approx. 10-fold. So is
this a "theory" already? I guess not.

A real theory arises from multiple experiments when we block this or
that component and see what'll it be in the end. As a result we get
the graph of a dozen of coagulation factors as nodes and activation
/inhibition reactions as edges. The most compact representation of a
graph is, afaik,  a list of edges (granting that there are no isolated
nodes in the graph). So we can say that this list is a "compressed"
theory of blood clot formation, can't we?

However can the process of building a theory of blood clotting be
described in terms of algorithmic complexity? Hardly. Perhaps only
when we start from the string containing the full list of blood
ingredients and then erase the ingredients irrelevant to clotting.

But after we have modeled the system of coagulation the main problem
arises. Suppose we assign the 13 coagulation factors the
identifiers, 0,1,2...,9,A,B,C. Suppose we represent the edges in a graph
with a list of the format
"ActorNode PatientNode2 PatientNode3, .... So we have the resulting 
"theory string" like this:

"CB)BA)A9)98)8A)A5)52)2578BD1)7A)D1".

But the most interesting is that some researchers claim that the
strings such as above can be analyzed with the concept of algorithmic
complexity! Of course, most such strings will be "complex" and
consequently "random". But this is nonsense!

One of my opponents introduces the "complexity of a system":
"Complexity of a system is defined in ATP as the minimum size of a
program (or of an algorithm) that is capable of encoding the system.

Moreover, he claims that "if the size of the minimal encoding program
cannot be reduced below the size of the system itself, it is
irreducible", and further: "... a very important consequence of the
basic theorems of ATP is as follows: if a system is indeed irreducibly
complex, it is necessarily random"

I think that this is a real misuse of the concept of algorithmic
complexity! Or, in the worst case, an example of some inborn defects
in it reduced ad absurdum.

Here the mythical "size of the system" is compared to the "minimal
encoding program". The absurdity of this corollary is evident if we
substitute, say, "bear" instead of "system":

"if the size of the program encoding bear cannot be reduced below the
size of the bear itself, it is irreducible", and with further
substitution: "if algorithmic complexity of a bear equals to the size
of bear, its complexity is irreducible (and random!)".

However, if a bear has multiple identical components (as water, that
is 80% of the bear's content), it is less random - but anyhow more
chaotic than a pure water!

To sum up, I believe that 
a) the complexity of a system, at least a system in the sense of  "a
set of elements with an integrative property that is missing in every
element or is not a sum or mean of the properties of all the elements"
- is a concept different from the algorithmic complexity, at least in
the vagueness of the "original data string" (although both concepts
may yield the same results, roughly, the sum of nodes and edges of a
system graph);

b) the randomness of a string in the sense of its compressibility is
not a very good tag, at least not coinciding with usual notion of
randomness in the theory of probability.

By the way, I found similar doubts in the book of Nicolis & Prigogine,
who claim that the real complexity of a system is something
intermediate between the dynamic chaos of a solution and a still order
of a crystal. I read also the viewpoints of some physicist who claimed
that the problem of filtering the noise from the signal was not solved
(or even tackled) in algorithmic probability theory, so when
"measuring the complexity of a system" we actually measure the
"complexity" of a noise.

As for me, I consider the dynamic chaos of a gas or solution
irrelevant to the notion of simple-complex, in the system analysis. It
is irrelevant as long as the order or orientation of molecules is
irrelevant.

I would be very obliged to you if you kindly show me where I?m wrong
or where my opponent is wrong.
Thank you for your attention and patience, 
Best wishes, Constantine.


:

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
От:       Gregory J Chaitin <chaitin@xx.xxx.com>
Кому:     Wablenica <wablenica@mail.ru>
Написано: 17 декабря 2003 г., 0:46:06
Тема:     Fw: Algorithmic complexity in biological systems?
Файлы:    message.htm
--====----====----====----====----====----====----====----====----====----===--
Another way to put it, is that my theory is an existence proof
that a mathematical theory of complexity is possible, at least in
the domain of meta-mathematics.  However, other, more practical
fields will require different kinds of complexity concepts...

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
От:       Gregory J Chaitin <chaitin@xx.xxx.com>
Кому:     Wablenica <wablenica@mail.ru>
Написано: 17 декабря 2003 г., 1:15:32
Тема:     Algorithmic complexity in biological systems?
Файлы:    message.htm
--====----====----====----====----====----====----====----====----====----===--
If you do send copies of my e-mail to other people,
I would appreciate it if you could send it to them complete, 
and do not just quote limited parts of it.  Thank you!

Wablenica <wablenica@mail.ru>
12/16/2003 04:50 PM
Please respond to Wablenica
 
        To:     Gregory J Chaitin/Watson/IBM@IBMUS
        cc: 
        Subject:        Re[2]: Algorithmic complexity in biological 
systems?


Thank you, professor Chaitin, for your answer.

<...>

Again, thank you very much.

Constantine.

P.S. May I cite your answer in my polemics with opponents?