BOOK III - Page 4

Semantical Systems: Information Theory

In 1953 Carnap and Yehousha Bar-Hillel, professor of logic and philosophy of science at the Hebrew University of Jerusalem, Israel, jointly published “Semantic Information” in the British Journal for the Philosophy of Science.  A more elaborate statement of the theory may be found in chapters fifteen through seventeen of Bar-Hillel’s Language and Information (1964).  This semantical theory of informa­tion is based on Carnap’s Logical Foundations of Probability and on Shannon’s theory of communication.  In the introduc­tory chapter of his Language and Information Bar-Hillel states that Carnap’s Logical Syntax of Language was the most influential book he had ever read in his life, and that he regards Carnap to be one of the greatest philosophers of all time.  In 1951 Bar-Hillel received a research associateship in the Research Laboratory of Electronics at the Massachusetts Institute of Technology.  At the time he took occasion to visit Carnap at the Princeton Institute for Advanced Study.  

In his “Introduction” to Studies in Inductive Logic and Probability, Volume I, Carnap states that during this time he told Bar-Hillel about his ideas on a semantical con­cept of content measure or amount of information based on the logical concept of probability.  This is an alternative concept to Shannon’s statistical concept of the amount of information.  Carnap notes that frequently there is confusion between these two concepts, and that while both the logical and statistical concepts are objective concepts of probability, only the second is related to the physical concept of entropy.  He also reports that he and Bar-Hillel had some discussions with John von Neumann, who asserted that the basic concepts of quantum theory are subjective and that this holds especially for entropy, since this concept is based on probability and amount of information.  Carnap states that he and Bar-­Hillel tried in vain to convince von Neumann of the exis­tence of the differences in each of these two pairs of con­cepts: objective and subjective, logical and physical.  As a result of the discussions at Princeton between Carnap and Bar-Hillel, they undertook the joint paper on semantical information.  Bar-Hillel reports that most of the paper was dictated by Carnap.  The paper was originally published as a Technical Report of the MIT Research Laboratory in 1952.

In the opening statements of “Semantic Information” the authors observe that the measures of information developed by Claude Shannon have nothing to do with what the semantics of the symbols, but only with the frequency of their occur­rence in a transmission.  This deliberate restriction of the scope of mathematical communication theory was of great heuristic value and enabled this theory to achieve important results in a short time.  But it often turned out that impatient scientists in various fields applied the terminology and the theorems of the theory to fields in which the term “information” was used presystematically in a semantic sense.  The clarification of the semantic sense of informa­tion is very important, therefore, and in this paper Carnap and Bar-Hillel set out to exhibit a semantical theory of information that cannot be developed with the concepts of infor­mation and amount of information used by Shannon’s theory.  Notably Carnap and Bar-Hillel’s equation for the amount of information has a mathematical form that is very similar to that of Shannon’s equation, even though the interpretations of the two similar equations are not the same.  Therefore a brief summary of Shannon’s theory of information is in order at this point before further discussion of Carnap and Bar­-Hillel’s theory.

Claude E. Shannon published his “Mathematical Theory of Communication” in the Bell System Technical Journal (July and October, 1948).  The papers are reprinted together with an introduction to the subject in The Mathematical Theory of Communication (Shannon and Weaver, 1964).  Shannon states that his purpose is to address what he calls the fundamental problem of communication, namely, that of reproducing at one point either exactly or approximately a message selected at another point.  He states that the semantical aspects of communication are irrelevant to this engineering problem; the relevant aspect is the selection of the correct message by the receiver from a set of possible messages in a system that is designed to operate for all possible selections.  If the number of messages in the set of all possible messages is finite, then this number or any monotonic function of this number can be regarded as a measure of the information produced, when one message is selected from the set and with all selections being equally likely.  Shannon uses a logarith­mic measure with the base of the log serving as the unit of measure.  His paper considers the capacity of the channel through which the message is transmitted, but the discussion is focused on the properties of the source.  Of particular interest is a discrete source, which generates the message symbol by symbol, and chooses successive symbols according to probabilities.  The generation of the message is therefore a stochastic process, but even if the originator of the message is not behaving as a stochastic process (and he probably is not), the recipi­ent must treat the transmitted signals in such a fashion.  A discrete Markov process can be used to simulate this effect, and linguists have used it to approximate an English-language message.  The approximation to English language is more successful, if the units of the transmission are words instead of letters of the alphabet.  During the years immediately following the publication of Shannon’s theory linguists attempted to cre­ate constructional grammars using Markov processes.  These grammars are known as finite-state Markov process grammars.  However, after Noam Chomsky published his Syntactical Struc­tures in 1956, linguists were persuaded that natural language grammars are not finite-state grammars, but are poten­tially infinite-state grammars.

In the Markov process there exists a finite number of possible states of the system together with a set of transition probabilities, such that for any one state there is an associated probability for every successive state to which a transition may be made.  To make a Markov process into an information source, it is necessary only to assume that a symbol is produced in the transition from one state to another.  There exists a special case called an ergodic process, in which every sequence produced by the process has the same statistical properties.  Shannon proposes a quantity that will measure how much information is produced by an information source that operates as a Markov process: given n events with each having probability p(i), then the quantity of information H is:

H = - Σ p(i) log p(i).

In their “Semantic Information” Carnap and Bar-Hillel introduce the concepts of information content of a statement and of content element. Bar-Hillel notes that the content of a statement is what is also meant by the Scholastic adage, omnis determinatio est negatio.  It is the class of those possible states of the universe, which are excluded by the statement.  When expressed in terms of state descriptions, the content of a statement is the class of all state descriptions excluded by the state­ment.  The concept of state description had been defined previously by Carnap as a conjunction containing as compo­nents for every atomic statement in a language either the statement or its negation but not both and no other state­ments.  The content element is the opposite in the sense that it is a disjunction instead of a conjunction.  The truth condition for the content element is therefore much less than that for the state description; in the state description all the constituent atomic statements must be true for the conjunction to be true, while for the content element only one of the constituent elements must be true for the disjunction to be true.  Therefore the content elements are the weakest possible factual statements that can be made in the object language.  The only factual state­ment that is L-implied by a content element is the content element itself.  The authors then propose an explicatum for the ordinary concept of the “information conveyed by the statement I” taken in its semantical sense: the content of a statement i, denoted cont(i), is the class of all content elements that are L-implied by the statement i.

The concept of the measure of information content of a statement is related to Carnap’s concept of measure over the range of a statement.  Carnap’s measure functions are meant to explicate the presystematic concept of logical or induc­tive probability. For every measure function a corresponding function can be defined in some way, that will measure the content of any given statement, such that the greater the logical probability of a statement, the smaller its con­tent measure.  Let m(i) be the logical probability of the statement i.  Then the quantity 1-m(i) is the measure of the content of i, which may be called the “content measure of I”, denoted cont(i).  Thus:

cont(i) = 1- m(i).

However, this measure does not have additivity proper­ties, because cont is not additive under inductive indepen­dence.  The cont value of a conjunction is smaller than the cont value of its components, when the two statements con­joined are not content exclusive.  Thus insisting on addi­tivity on condition of inductive independence, the authors propose another set of measures for the amount of informa­tion, which Carnap and Bar-Hillel call “information measures” for the idea of the amount of information in the statement i, denoted inf(i), and which they define as:

inf(i) = log  {1/[1-cont(i)]}

which by substitution transforms into:

inf(i) = - log m(i).

This is analogous to the amount of information in Shannon’s mathematical theory of communication but with inductive probability instead of statistical probability.  They make their use of the logical concept of probability explicit when they express it as:

inf(h/e) = - log c(h,e)

where c(h,e) is defined as the degree of confirmation and  inf(h/e) means the amount of information in hypothesis h given evidence e.  Bar-Hillel says that cont may be regarded as a measure of the “substantial” aspect of a piece of information, while inf may be regarded as a measure of its “surprise” value or in less psychological terms of its “objective unexpectedness.”  Bar-Hillel believed that their theory of semantic information might be fruitfully applied in various fields.   However, neither Carnap nor Bar-Hillel followed up with any investigations of the applicability of their semantical con­cept of information to scientific research.  Later when Bar-Hillel’s interests turned to the analysis of natural language, he noted that linguists did not accept Carnap’s semantical views.

Shreider’s Semantic Theory of Information

Carnap’s semantic theory of information may be contrasted with a more recent semantic information theory proposed by the Russian information scientist, Yu A. Shreider (also rendered from the Russian as Ju A. Srejder).  In his “Basic Trends in the Field of Semantics” in Statistical Methods in Linguis­tics (1971) Shreider distinguishes three classifications or trends in works on semantics, and he relates his views to Carnap’s in this context.  The three classifications are ontological semantics, logical semantics, and linguistic semantics.  He says that all three of these try to solve the same problem: to ascertain what meaning is and how it can be described.  The first classification, ontological semantics, is the study of the various philosophical aspects of the relation between sign and signified.  He says that it inquires into the very nature of existence, into the degrees of reality possessed by signified objects, classes and situations, and that it is closely related to the logic and methodology of science and to the theoretical foundations of library classification.

The second classification, logical semantics, studies formal sign systems as opposed to natural languages.  This is the trend in which he locates Carnap, as well as Quine, Tarski, and Bar-Hillel.  The semantical systems con­sidered in logical semantics are basic to the metatheory of the sciences.  The meaning postulates determine the class of permissible models for a given system of formal relations.  A formal theory fixes a class of syntactical relations, whence there arises a fixed system of semantic relations within a text describing a possible world. 

The third classification, linguistic semantics, seeks to elucidate the inherent organization in a natural language, to formulate the inherent regularities in texts and to construct a system of basic semantic relations.  The examination of properties of extralinguistic reality, which determines permissible semantic relations and the ways of combining them, is carried considerably farther in lin­guistic semantics than in logical semantics, where the question is touched upon only in the selection of meaning postulates.  However, linguistic semantics is still rather vague and inexact, being an auxiliary investigation in lin­guistics used only as necessity dictates.  Shreider locates his work midway between logical and linguistic semantics, because it involves the examination of natural language texts with logical calculi.

Shreider’s theory is a theory of communication that explains phenomena not explained by Shannon’s statistical theory.  Bibliographies in Shreider’s English-language articles contain references to Carnap’s and Bar-Hillel’s 1953 paper, and Shreider explicitly advocates Carnap’s explication of intensional synonymy in terms of L-equiva­lence.  But Shreider’s theory is more accurately described as a development of Shannon’s theory, even though Shreider’s theory is not statistical.  English-language works by Shreider include “On the Semantic Characteristics of Infor­mation” in Information Storage and Retrieval (1965), which is also reprinted in Introduction to Information Science (ed. Tefko Saracevic, 1970), and “Semantic Aspects of Infor­mation Theory” in On Theoretical Problems On Information (Moscow, 1969).  Furthermore comments on Shreider and other contributors to Russian information science (or “informatics” as it is called in Russia) can be found in “Some Soviet Concepts of Information for Information Science” in the American Society for Information Science Journal (1975) by Nicholas J. Belkin.

Like many information scientists who take up semantical considerations, Shreider notes that there are many situa­tions involving information, in which one may wish to consi­der the content of the message signals instead of the sta­tistical frequency of signal transmission considered by Shannon’s theory.  But Shreider furthermore maintains that a semantical concept of information implies an alternative theory of communication in contrast to Shannon’s “classical” theory.  Shannon’s concept pertains only to the potential ability of the receiver to determine from a given message text a quantity of information; it does not account for the information that the receiver can effectively derive from the message, that is, the receiver’s ability to “understand” the message.  In Shreider’s theory the knowledge had by the receiver prior to receiving the message is considered, in order to determine the amount of information effectively communicated.

More specifically, in Shannon’s probability-theoretic approach, before even considering the information contained in a message about some event, it is necessary to consider the a priori probability of the event.  Furthermore according to Shannon’s first theorem, in the optimum method of coding a statement containing more information requires more binary symbols or bits.  In Shreider’s view, however, a theory of information should be able to account for cases that do not conform to this theorem.  For example much information is contained in a statement describing a newly discovered chemical element, which could be coded in a small number of binary symbols, and for which it would be meaningless to speak of an a priori probability.  On the other hand a statement describing the measurements of the well known physicochemical properties of some substance may be considerably less informative, while it may need a much more extensive description for its coding.  The newly discovered element will change our knowledge about the world much more than measurement of known substances.  Shreider maintains that a theory of information that can take into account the receiver’s ability to “understand” a message must include a description of the receiver’s background knowledge.  For this reason his information theory includes a thesaurus, by which is meant a unilingual dictionary showing the semantic connections among its constituent words. Shreider’s concept of information is thus consistent with Hickey’s thesis of communication constraint. 

Let T denote such a thesaurus to represent a guide in which there is recorded our knowledge about the real world.  The thesaurus T can be in any one of various states, and it can change or be trans­formed from one state to another.  Let M represent a received message, which can transform the thesaurus T.  Then the concept of amount of information, denoted L(T,M), may be defined as the degree of change in the thesaurus T under the action of a given statement M.  And for each admissible text M expressed in a certain code or language, there corresponds a certain transformation operator ? that acts on thesaurus T.  The salient point is that the amount of information con­tained in the statement M relative to the thesaurus T is characterized by the degree of change in the thesaurus under the action of the communicated statement.  And the under­standing of the communicated statement depends on the state of the receiver’s thesaurus.  Accordingly the thesaurus T can understand some statements and not others.  There are some statements that cannot be understood by a given thesaurus, and the information for such a thesaurus is zero, which is to say L(T, M)=0, because the thesaurus T is not transformed at all.  One such case is that of a student or a lay­man who does not have the background to understand a transmitted message about a specialized subject.  Another case is that of someone who already knows the transmitted information, so that it is redundant to what the receiver already knows.  In this case too there is no informa­tion communicated, and again L(T,M)=0, but in this case it is because the thesaurus T has been transformed into its initial state. 

The interesting situation is that in which the receiver’s thesaurus is sufficiently developed that he under­stands the transmitted message, but still finds his thesaurus transformed into a new and different state as a result of receipt of the new information.  If the rules of construction of the transformation operator ? are viewed as external to the thesaurus T, then the quantity L(T,M) depends on these rules.  And when the transformation operator ? is also revised, a preliminary increase of the knowledge stored in the thesaurus T may not only decrease the quantity of information L(T,M), but can also increase it.  Thus some­one who has learned a branch of a science will derive more information from a special text in the branch than he would before he had learned it.  This peculiar property of the semantic theory of information basically distinguishes it from the Shannon’s classical theory, in which the increase in a priori information always decreases the amount of information from a message statement M.  In the classical theory there is no question of a receiver’s degree of “understand­ing" of a statement; it is always assumed that he is “tuned.”  But in the semantic theory the essential rôle is played by the very possibility of correct “tuning” of the receiver.

In his 1975 article Belkin reports that Shreider further developed his theory of information to include the idea of “meta-information.”  Meta-information is information about the mode of the coding of information, i.e., the knowledge about the relation between information and the text in which it is coded.  In this sense of meta-information the recei­ver’s thesaurus must contain meta-information in order to understand the information in the received message text, because it enables the receiver to analyze the organization of the semantic information, such as that which reports scientific research findings.  Shreider maintains that informatics, the Russian equivalent to information sci­ence, is concerned not with information as such, but rather with meta-information, and specifically with information as to how scientific information is distributed and organized.

Therefore, with his concept of meta-information Shreider has reportedly modified his original theory of communication by analyzing the thesaurus T into two components, such that T=(Tm,To).  The first component Tm consists of the set of rules needed for extracting elementary messages from the text M, while the second component To consists of the fac­tual information that relates those elementary messages sys­tematically and enables the elements to be integrated in T.  The relationship between Tm and To is such that a decrease in the redundancy of coding of To requires an increase of the meta-information in Tm for the decoding of the coding system used for To.  Hence the idea of meta-information may be a means of realizing some limiting efficiency laws for information by analyzing the dependency relation between information and the amount of meta-information necessary to comprehend that information. 

It would appear that if the coding system is taken as a language, then Shreider’s concept of meta-information might include the idea of a metalanguage as used by Carnap, Hickey and other analytical philosophers, or it might be incorporated into the metalanguage.  Then the elements Tm and To are distinguished as metalanguage and object language respectively.

Pages [1] [2] [3] [4] [5] [6] [7] [8] [9]
NOTE: Pages do not corresponds with the actual pages from the book