We use to think that "logic" is "self-evidently" correct, for instance the rule of "deduction by contradiction":

Proposition 3: (~p -> ~q) -> (q -> p)

People would nod when they hear me say "because... thus... so... leads to... contradict with... suppose... must..." etc. However, do we "leap" somewhere above? Which of them are the real "atoms" of logic? Suppose two people write down two proofs of a theorem, almost any human can immediately see they are "equivalent" or not. But how can a machine do that? In other words, we should port our usual free-style of proof on paper, to a maximally restricted "action set", at least once for a lifetime. The formalism which I'll describe in the next paragraph is Symbolic Logic. The reason for following this procedure is that only then can we fully drive "human intuitions" or "common sense" out of our proofs - it might be wrong, just like the "axiom" of 3D Euclidean space. Only through the uncomfortable procedure of symbolic logic can we clearly chart out the hierarchical structure of the Knowledge Tree in which all conscious entities reside; Russell proposed that the human consciousness subtree's root node be put on set theory followed by symbolic logic - and that the entire subtree be called Mathematics - but the real human experience in this world just follows one (maybe unbounded) traversal of the subtree. And it will be always "exciting", as proved by Goedel, in the sense that we will always encounter, formalize and try out (traverse all possibilities of) new propositions that cannot be proven by the previous set of axioms and the symbolic logic procedure, no matter how deep we have traversed. In other words, we are bound to discover and be given the choice of picking new axiom to believe in - accepting it, or its converse, are both OK as far as mathematics is concerned - but which choice of the mathematical branch is relevant to this human world will depend on physical measurements, which is not pre-ordained by math.

Returning to symbolic logic (L), it turns out that the "logic" humans use (HL), is only one, albeit an important, branch of L (HL can be shown in L to achieve the largest set of provable propositions). L is a good container to put almost all mathematics in, and it stresses step-by-step simple mechanical operation, by which I mean a "logic" machine (it could be a mathematician sitting in front of his desk) that only do three kinds of operations, on a Sheet of Formal Proof:

L0: p->q and p have both appeared previously on the proof sheet, then write down q; for any choice of p,q. In effect this is the definition of "->", coming from ancient Greeks.

L1: write down p -> (q->p); for any choice of p,q

L2: write down (p->(q->r)) -> ((p->q) -> (p->r)); for any choice of p,q,r

A Formal Proof is any sequence of propositions a machine can generate on the proof sheet, whereby proposition P_n is the result of applying L1 or L2 operations to P_{n-1}, by substituting to p, q or r certain groups of the symbols in P_{n-1}; or by using L0 from two previous propositions.

We see that the actions of symbolic logic, after given a certain choice of axiom set besides L0, L1 and L2, looks very similar to a string production operation based on a phrase-structure, or Type 0, grammar, where the terminal symbols are the pre-selected axioms and there are only three, though quite complicated, production rules. (Noam Chomsky: type I, context-sensitive, aAb->longer; type I, context-free, A->anything; type III, regular expr., A->aB). Turing proved that this is indeed the case, that Formal Proofs which mathematicians wrote on their books, such as Principles of Mathematics, are all described by a Type 0 grammar; Turing then proved that any language (subset of all strings on an alphabet) is recognizable by a mechanical invention called the Turing machine if and only if that language is described by a Type 0 grammar. Turing thus proved the fact that his machine can do whatever a mathematician can do using Formal Logic.

As an example of "doing it", let us formally prove the proposition "p->p" in symbolic logic (don't laugh - apes cann't even do it). A Formal Proof sequence could be

p1: use L1, let q = p:

p -> (p->p)
p2: use L2, let q = p->p, r = p:
(p -> ((p->p)->p)) -> ((p->(p->p)) -> (p->p))
p3: use L1, let q = p->p:
p -> ((p->p)->p)
p4: use L0 on p2, let p = p3:
(p->(p->p)) -> (p->p)
p5: use L0 on p4, let p = p1:
p -> p
We see that this particular proof of p->p can be uniquely denoted by an instruction sequence L1-L2-L1-L0-L0 that even a machine could check. It fulfills in some way the dreams of ancient mathematicians who hoped that "proof can be done in the same way as arithmetic." However, Hilbert hoped for more: he wanted to see if one could construct a machine that proves true or false every proposition. This is already history due to the work of Goedel and Turing: one sees that no matter how many axioms one already possesses, there always exists propositions that cannot be proven. So it is not hard to imagine that there exists a language (not belonging to Type 0 grammar) that cannot be recognized by any Turing machines. Put it another way, any function f(x) where x is an integer ranging from 0 to infinity can be classified into two classes, according to whether it can be computed by any Turing machine with pre-fixed number of states (# of states varies from problem to problem: but for a given problem, it must be fixed), or not. The first class is called computable functions, and the problem is said to have an effective algorithm; otherwise it cannot be computed by any fixed-state Turing machines, and cannot be effectively computed by any computing machines, including human, if one believes the Church-Turing thesis, which says that a human is no better than a Turing machine; since giving up the fixed-state requirement means that the amount of calculation is blowing up incredibly; and since all "effective algorithms" up to now (+, *, sin, max, FFT) can all be modelled by Turing machines. An example of such a problem is the so-called "busy beaver function" B(n), which is how many 1's a n-state Turing machine can maximally generate starting from a blank tape.

Human logic comes by adding Proposition 3 to L0, L1 and L2, which is actually a definition of the ~ (NOT) operator. Using the Formal Proof procedures above, we can arrive at the following "well-known" theorems:

~~p -> p (p -> q) -> (~q -> ~p) etc.
and in fact define operators AND, OR, = from them, and all results of of Boolean algebra. It is easy to "prove" L0, L1, L2 using Boolean algebra formula

A->B = (~A) OR B
But the important conclusion is that even without Proposition 3, one can still have "non-contradictory" logical systems - it was proven solely by using set theory and L0, which is the outermost wrapper of all mathematics, that for any logic system belonging to L (a set), either it never reaches both p and ~p, or it reaches both p and ~p for any proposition p. In other word, a logic system either makes sense by not contradicting itself, or it is completely rubbish which can be then easily identified.

Set theory is the mother of all mathematics: it contains 12 or so axioms proposed by Russell and Whitehead in pains-taking carefulness to avoid his own "I am lying" paradox, by demanding any set not having itself as an element, or n-body recursions for that matter (the collection of all sets is called a class). In c++ "class" notation, one would write down the following sketchy hierarchy:

Set theory + L0
      Logic (L1, L2)
  Human_Logic(Proposition 3), or Boolean algebra
         /       \
        /         \         
  Algebra         Geometry
    |              /    \ 
    |             /      \
    |     Euclidean     Non-Euclidean
    |        |               |
    |        |               |
  Classical_Physics ~~~~~ General_Relativity
Russell and Whitehead hoped that set theory will be used as a solid foundation to prove true or false all mathematical propositions, based on axioms one chooses at finite levels of the hierarchy (by axioms we mean they are minimal in the deduction process). For instance, integers are defined as sets:

0: e (empty set)

1: {0}

2: {0, 1}

3: {0, 1, 2}


and then set "and/or" become max/min on integers, and special operations are defined for +/-/* on integers. It came as a shock then, when Goedel and later Turing proved it cannot be done: that all logical systems with a finite axiom set are incomplete in a world of infinite discourse (a finite-discourse universe, like one having only one proposition, can easily be made complete by making that an axiom!). Given any non-contradictory axiom set A, there bound to exist a proposition p which can never be generated as a leaf node (result) of the Formal Proof tree, and neither could ~p. That means incorporating p as an axiom in A generates a new non-contradictory axiom set A+, and incorporating ~p as an axiom into A generates A- which is also non-contradictory. And one can do this FOREVER.

What are the "paradoxical" propositions under Human Logic? It seems that some of them could be pure mathematical arguments that may never have anything to do with human welfare, like some dealing with infinite sets. But consider the following propositions,

p1: through one point you can draw one and only one straight line that does not cross a given straight line

p2: any two straight lines cross

p3: through one point you can draw infinitely many straight lines that does not cross a given straight line

Clearly, Human Logic does not tell us which of the propositions are correct; creatures with "human logic" can live perfectly happy lives in all three worlds, namely Euclidean, parabolic and hyperbolic, without feeling any "logical contradictions" as defined by L. In fact, even nowadays we don't know for sure (from Hubble's constant measurement) which world WE live in. Even if we know, it does not mean we should not study the MATH of "other worlds", because MATH is a universal thing that does not specifically have human world in mind. If I bred 100 kids and lock them up on the surface of a globe - they are going to use parabolic geometry, and happily report to me their world has no such thing as an infinite area - they probably don't know what is "infinite" at all (you see, accepting or dis-accepting p also changes the range of the universe of discourse); but if they are smart, they will also tell me they've come up with Euclidean and hyperbolic geometries too :). Who knows, maybe we are someone's kids?

The fact that most mathematicians acknowledge (Set theory + L0) as the root node of human knowledge tree means that, at least for now, we are only willing to entertain "alien"/"god" worlds that have THAT much common ground with us (if they start with L or HL or General_Relativity, that's even better), so we can still communicate with them on the conscious level even if we don't live their lives. Beyond that root node maybe human don't even have enough faculty to understand, so perhaps part of the knowledge tree is forever dark to us. The entire subtree starting from Set theory is called MATH; the majority of MATH starting from L is called LOGIC. The task of human scientists is to go through MATH, proposing p's along the way, maybe putting more emphasis on the route which is believed to be OUR world but illuminating close-by branches anyway. Goedel simply proved the fact that no matter which tree-traversal route a logic entity picks starting from Set theory, be it earthmen's or the last-big-bang-Alpha-centaurians' (we trust them to be always with us on the traversal route with the same "universal physical laws", but who knows?), the scientists can NEVER stop proposing new p's and make trial traversals - given that the knowledge tree is infinitely deep. The reasons for trying different p-branches are 1) experiments can't tell which is the real one yet. 2) the other p-branches might turn out useful in different contexts. Thus Goedel simply showed that very probably earth/centaurian scientists will never be able to lose their jobs. That sounds about right, when you think about it.

How did Goedel and Turing proved this "never-say-die" property? Just remember that in our "p->p" Formal Proof, the entire proof can be expressed as a unique string, something like "L1-L2-L1-L0-L0" - a piece of machine code that is equivalent to an integer. In general, suppose we have done certain traversal on the knowledge tree and have decided on a set of n axioms. Then each proposition we can prove formally based on these n axioms corresponds to a unique integer, and the set of propositions we CAN prove is "no larger" than the set of integers. The gist of the proof is then similar to showing that the set of real numbers cannot be labelled by integers, because for any labelling scheme, I can always construct a real number that is not labelled by ANY finite integer index (just list them and take the iith field\pm 1). In the same way, I can list all propositions you can prove and come up with a composite proposition p that surely you cannot prove (or disprove). So it doesn't matter mathematically whether you believe in p or ~p; either is OK for your "logical brain"; which one finally you find useful depends on physical measurements such as the Hubble's constant. That judgment call is PHYSICS; this is MATH.

We thus see that a "Formal Proof" could be done by a machine, which is called the Turing-machine, and let me define it properly: given an alphabet, say of English+Greek+German+Japanese characters, there is a set S which is the set of all strings that can be composed from this alphabet. A language G is defined as any subset of S. A human expert on language G is responsible for one task: given any element x in S, he should tell whether x is a member of G or not, in finite but unbounded time, based on some rules he remembers and granting him infinite "deductive capability" like Sherlock Holms, meaning actually infinite memory storage spaces. Now it may not surprise you that there exist certain G and certain string in S which even an expert on G cannot tell whether it belongs to G or not. Turing also convinced us that such a human expert on G is no better than a specifically designed Turing-machine: a pre-arranged state-transition machine based on input (string x, say coming from a modem) and having infinite write/read space (scratch paper, memory - whichever you call it; originally Turing calls it tape, which is perhaps misleading).

Clearly, Turing-machines are better than some other machines, like finite-state automaton with finite storage (FSA) or that with a infinite push stack (PSA), for Turing-machines can recognize more languages. In fact, FSA only recognizes regular expressions like a*, while PSA only recognizes languages with LALR grammar like a^nb^n. Turing-machines can be made to recognize all languages that human can recognize, like a^nb^nc^n.

Also, since Turing-machines can be made to recognize certain languages, they eventually can be made to recognize programming languages (works like escape sequences in a terminal) that enables it to emulate other Turing-machines, be it the chessing-playing Turing-machine or the pi-calculating Turing-machine. Such general purpose Turing machines are called Universal Turing machines, and modern day computers actually work very much like that, besides not literally having infinite memory resources.