INTRODUCTION TO PHILOSOPHY OF SCIENCE
Book I Page 6
4.10 Scientific Discovery
“Discovery” refers to the development of new and empirically superior theories.
Much has already been said in the above discussions of philosophy of scientific language in Chapter 3 about the pragmatic basis for the definition of theory language, about the semantic basis for the individuation of theories, and about state descriptions. Those discussions will be assumed in the following comments about the mechanized development of new theories.
Discovery is the first step toward realizing the aim of science. The problem of scientific discovery for contemporary pragmatist philosophers of science is to proceduralize and then to mechanize the development of universally quantified statements for empirical testing with nonfalsifying test outcomes, thereby making laws for use in explanations and test designs. Contemporary pragmatism is consistent with the techniques of computerized discovery systems.
4.11 Discovery Systems
A discovery system produces a transition from an input-language state description containing currently available language to an output-language state description containing generated and tested new theories.
In the “Introduction” to his Models of Discovery (1977) Simon, one of the founders of artificial intelligence wrote that dense mists of romanticism and downright know-nothingism have always surrounded the subject of scientific discovery and creativity. Therefore the most significant development addressing the problem of scientific discovery has been the relatively recent mechanized discovery systems in a new specialty called “computational philosophy of science”.
The ultimate aim of the computational philosopher of science is to facilitate the advancement of contemporary sciences by participating in and contributing to the successful basic-research work of the scientist. The contemporary pragmatist philosophy of science thus carries forward the classical pragmatist John Dewey’s emphasis on participation. Unfortunately few academic philosophers have the requisite computer skills much less a working knowledge of any empirical science for participation in basic research. Hopefully that will change in future Ph.D. dissertations in philosophy of science, which are likely to be interdisciplinary endeavors.
Every useful discovery system to date has contained procedures both for constructional theory creation and for critical theory evaluation for quality control of the generated output and for quantity control of the system’s otherwise unmanageably large output. Theory creation introduces new language into the current state description to produce a new state description, while falsification in empirical tests eliminates language from the current state description to produce a new state description. Thus both theory development and theory testing enable a discovery system to offer a specific and productive diachronic dynamic procedure for linguistic change to advance empirical science.
The discovery systems do not merely implement an inductivist strategy of searching for repetitions of individual instances, notwithstanding that statistical inference is employed in some system designs. The system designs are mechanized procedural strategies that search for patterns in the input information. Thus they implement Hanson’s thesis in Patterns of Discovery that in a growing research discipline inquiry seeks the discovery of new patterns in data. They also implement Feyerabend’s “plea for hedonism” in Criticism and the Growth of Knowledge (1971) to produce a proliferation of theories. But while many are made by these systems, mercifully few are chosen thanks to the empirical testing routines in the systems to control for quality of the outputted equations.
4.12 Types of Theory Development
In his Introduction to Metascience Hickey distinguishes three types of theory development, which he calls theory extension, theory elaboration and theory revision. This classification is vague and may be overlapping in some cases, but it reflects three alternative types of discovery strategies and therefore implies different discovery-system designs.
Theory extension is the use of a currently tested and nonfalsified explanation to address a new scientific problem.
The extension could be as simple as adding hypothetical statements to make a general explanation more specific for a new problem at hand. A more complex strategy for theory extension is analogy. In his Computational Philosophy of Science (1988) Thagard describes this strategy for mechanized theory development, which consists in the patterning of a proposed solution to a new problem by analogy with a successful explanation originally developed for a different subject. Using his system design based on this strategy his discovery system called PI (an acronym for “Process of Induction”) reconstructed development of the theory of sound waves by analogy with the description of water waves. The system was his Ph.D. dissertation in philosophy of science at the University of Toronto, Canada.
In his Mental Leaps: Analogy in Creative Thought (1995) Thagard further explains that analogy is a kind of nondeductive logic, which he calls “analogic”. It firstly involves the “source analogue”, which is the known domain that the investigator already understands in terms of familiar patterns, and secondly involves the “target analogue”, which is the unfamiliar domain that the investigator is trying to explain. Analogic is the strategy whereby the investigator understands the targeted domain by seeing it in terms of the source domain. Analogic requires a “mental leap”, because the two analogues may initially seem unrelated. And the mental leap is also called a “leap”, because analogic is not conclusive like deduction.
It may be noted that if the output state description generated by analogy such as the PI system is radically different from anything previously seen by the affected scientific profession containing the target analogue, then the members of that affected profession may experience the communication constraint to the high degree that is usually associated with a theory revision. The communication constraint is discussed below (See below, Section 4.26).
Theory elaboration is the correction of a currently falsified theory to create a new theory by adding new factors or variables that correct the falsified universally quantified statements and erroneous predictions of the old theory.
The new theory has the same test design as the old theory. The correction is not merely ad hoc excluding individual exceptional cases, but rather is a change in the universally quantified statements. This process is often misrepresented as “saving” a falsified theory, but in fact it creates a new one.
For example the introduction of a variable for the volume quantity and development of a constant coefficient for the particular gas could elaborate Gay-Lussac’s law for gasses into the combined Gay-Lussac’s law, Boyle’s law and Charles’ law. Similarly Friedman’s macroeconomic quantity theory might be elaborated into a Keynesian hyperbolic liquidity-preference function by the introduction of an interest rate, both to account for the cyclicality manifest in an annual time series describing the calculated velocity parameter and to display the liquidity trap phenomenon, which occurred both in the Great Depression (1929-1933) and in the recent Great Recession (2007-2009).
Pat Langley’s BACON discovery system exemplifies mechanized theory elaboration. It is named after the English philosopher Francis Bacon (1561-1626) who thought that scientific discovery can be routinized. BACON is a set of successive and increasingly sophisticated discovery systems that make quantitative laws and theories from input measurements. Langley designed and implemented BACON in 1979 as the thesis for his Ph.D. dissertation written in the Carnegie-Mellon department of psychology under the direction of Simon. A description of the system is in Simon’s Scientific Discovery: Computational Explorations of the Creative Processes (1987).
BACON uses Simon’s heuristic-search design strategy, which may be construed as a sequential application of theory elaboration. Given sets of observation measurements for several variables, BACON searches for functional relations among the variables. Langley reports that BACON has simulated the discovery of several historically significant empirical laws including Boyle’s law of gases, Kepler’s third planetary law, Galileo’s law of motion of objects on inclined planes, and Ohm’s law of electrical current.
Theory revision is the reorganization of currently existing information to create a new theory.
The results of theory revision may be radically different from any current theory, and may thus be said to occasion a “paradigm change”. It might be undertaken after repeated attempts at both theory extension and theory elaborations have failed. The source for the input state description for mechanized theory revision presumably consists of the descriptive vocabulary from any currently untested theories addressing the problem at hand. The descriptive vocabulary from previously falsified theories may also be included as inputs to make an accumulative state description, because the vocabularies in rejected theories can be productively cannibalized for their scrap value. In fact even terms and variables from tested and nonfalsified theories could also be included, just to see what new proposals come out; empirical underdetermination permits scientific pluralism, and reality is full of surprises. Hickey notes that a mechanized discovery system’s newly outputted theory is most likely to be called revolutionary if the revision is great, because theory revision typically produces greater change to the current language state than does theory extension or theory elaboration thus producing psychologically disorienting semantical dissolution in the transition.
Theory revision is the reorganization of currently existing information to create a new theory. The thesis of historian of science Herbert Butterfield’s (1900-1979) Origins of Modern Science: 1300-1800 (P. 1) is that the type of transition known as a “scientific revolution” was not brought about by new observations or additional evidence, but rather by transpositions in the minds of the scientists. Specifically he maintains that the type of mental activity that produced the historic scientific revolutions is the art of handling the same bundle of data as known before, but placing the data in a new system of relations with one another.
Hickey found this “art” in the history of economics. The applicability of his METAMODEL discovery system for theory revision by Keynes was already known in retrospect by the fact that, as 1980 Nobel-laureate econometrician Lawrence Klein wrote in his Keynesian Revolution (1949, pp. 13 & 124), all the important parts of Keynes theory can be found in the works of one or another of his predecessors. Thus Keynes used the same bundle of information that was known before, but placed it in a new system of relations, such as his aggregate consumption function and his demand for money with its speculative-demand component.
In 1976 Hickey also used his METAMODEL system to develop a post-classical macrosociometric functionalist model of the American national society with fifty years of historical time-series data. The generated sociological model disclosed an intergenerational negative feedback that sociologists would call a “macrosocial integrative mechanism”, in which an increase in social disorder indicated by a rising homicide rate calls forth a delayed intergenerational stabilizing reaction by the socializing institution indicated by the high school completion rate, which tends to restore order by reinforcing compliance with criminal law. To the shock, chagrin and dismay of complacent academic sociologists it is not a social-psychological theory, and four sociological journals therefore rejected Hickey’s paper, which describes the model and its findings about the American national society’s dynamics and stability characteristics. The paper is reprinted in “Appendix I” to BOOK VIII at the free web site www.philsci.com and in the e-book Twentieth-Century Philosophy of Science: A History, which is available from most Internet booksellers.
The provincial academic sociologists’ a priori ontological commitments to romanticism and to social-psychological reductionism rendered the editors and their chosen referees invincibly obdurate. The editors’ favorite referees also exhibited their Luddite mentality toward mechanized theory development. The referee criticisms and Hickey’s rejoinders are given in “Appendix II” to BOOK VIII at the free web site www.philsci.com and in the e-book Twentieth-Century Philosophy of Science: A History, which is available from most Internet booksellers.
Later in the mid-1980’s Hickey integrated his macrosociometric model into a Keynesian macroeconometric model to produce an institutionalist macroeconometric model while he was employed as Deputy Director and Senior Economist for the Indiana Department of Commerce, Division of Economic Analysis during the Orr-Mutz Administration (1981-1988). The report of the findings was read to the Indiana Legislative Assembly by the Speaker of the House in support of the Governor’s “A-plus” successful legislative initiative for increased State-government spending for primary and secondary public education.
4.13 Examples of Successful Discovery Systems
There are several examples of successful discovery systems in use. John Sonquist developed his AID system for his Ph.D. dissertation in sociology at the University of Chicago. His dissertation was written in 1961, when William F. Ogburn was department chairman, which was before Edward O. Laumann and the romantics took over the University of Chicago sociology department. He described the system in his Multivariate Model Building: Validation of a Search Strategy (1970). The system has long been used at the University of Michigan Survey Research Center. Now modified as the CHAID system using chi-squared (χ2) Sonquist’s discovery system is available commercially in both the SAS and SPSS software packages. Its principal commercial application is for list-processing scoring models for commercial market analysis and for credit risk analysis as well as for academic investigations in social science. It is not only the oldest mechanized discovery system but also the most widely used in practical applications.
Robert Litterman developed his BVAR (Bayesian Vector Autoregression) system for his Ph.D. dissertation in economics at the University of Minnesota. He described the system in his Techniques for Forecasting Using Vector Autoregressions (1984). The economists at the Federal Reserve Bank of Minneapolis have used his system for macroeconomic and regional economic analysis. The State of Connecticut and the State of Indiana have also used it for regional economic analysis.
Having previously received an M.A. degree in economics Hickey had intended to develop his METAMODEL computerized discovery system for a Ph.D. dissertation in philosophy of science while a graduate student in the philosophy department of the University of Notre Dame, South Bend, Indiana. But the Notre Dame Philosophy Department is an intolerant obdurate backwater, and its faculty was obstructionist to Hickey’s views. Hickey therefore dropped out without a Ph.D. He then developed his computerized discovery system as a nondegree student at San Jose City College in San Jose, CA, a two-year associate-arts degree community college, which has a better computer and better teachers than Notre Dame’s graduate school of philosophy.
For the next thirty years Hickey used his discovery system occupationally, working as a research econometrician in both business and government. For six of those years he used his system for institutionalist macroeconometric modeling and regional econometric modeling for the State of Indiana Department of Commerce. He also used it successfully for econometric market analysis and risk analysis for various business corporations including USX/United States Steel Corporation, BAT(UK)/Brown and Williamson Company, Pepsi/Quaker Oats Company, Altria/Kraft Foods Company, Allstate Insurance Company, and TransUnion LLC. In 2004 TransUnion’s Analytical Services Group purchased a perpetual license to use his METAMODEL system for their consumer credit risk analyses using their proprietary TrenData aggregated quarterly time series extracted from their national database of consumer credit files. Hickey used the models generated by the discovery system to forecast payment delinquency rates, bankruptcy filings, average balances and other consumer borrower characteristics that affect risk exposure for lenders. He also used the system for Quaker Oats and Kraft Foods to discover the sociological and demographic factors responsible for the secular long-term market dynamics of food products and other nondurable consumer goods.
In 2007 Michael Schmidt, a Ph.D. student in computational biology at Cornell University, and his dissertation director, Hod Lipson developed their system EUREQA at Cornell University’s Artificial Intelligence Lab. The system automatically develops predictive analytical models from data using a strategy they call an “evolutionary search” to find invariant relationships, which converges on the simplest and most accurate equations fitting the inputted data. The system has been used by many business corporations, universities and government agencies including Alcoa, California Institute of Technology, Cargill, Corning, Dow Chemical, General Electric, Amazon, Shell and NASA.
For more about discovery systems and computational philosophy of science readers are referred to BOOK VIII at the free web site www.philsci.com and in the e-book Twentieth-Century Philosophy of Science: A History, which is available from most Internet booksellers.
4.14 Scientific Criticism
Criticism pertains to the criteria for the acceptance or rejection of theories. The only criterion for scientific criticism that is acknowledged by the contemporary pragmatist is the empirical criterion.
The philosophical literature on scientific criticism has little to say about the specifics of experimental design as might be found in college-level laboratory manuals. Most often philosophical discussion of criticism pertains to the criteria for acceptance or rejection of theories and more recently to the decidability of empirical testing.
In earlier times when the natural sciences were called “natural philosophy” and social sciences were called “moral philosophy”, nonempirical considerations operated as criteria for the criticism and acceptance of descriptive narratives. Even today some philosophers and scientists have used their semantical and ontological preconceptions as criteria for the criticism of theories including preconceptions about causality or specific causal factors. Such semantical and ontological preconceptions have misled them to reject new empirically superior theories. In his Against Method Feyerabend noted that the ontological preconceptions used to criticize new theories have often been the semantical and ontological claims expressed by previously accepted and since falsified theories.
What historically has separated the empirical sciences from their origins in natural and moral philosophy is the empirical criterion. This criterion is responsible for the advancement of science and for its enabling practicality in application. Whenever in the history of science there has been a conflict between the empirical criterion and any nonempirical criteria for the evaluation of new theories, it is eventually the empirical criterion that ultimately decides theory selection.
Contemporary pragmatists accept relativized semantics, scientific realism, and thus ontological relativity, and they therefore reject all prior semantical or ontological criteria for scientific criticism including the romantics’ mentalistic ontology requiring social-psychological or any other kind of reductionism.
4.15 Logic of Empirical Testing
An empirical test is:
(1) an effective decision procedure that can be schematized as a modus tollens logical deduction from a set of one or several universally quantified theory statements expressible in a nontruth-functional hypothetical-conditional schema
(2) together with a particularly quantified antecedent description of the initial test conditions as defined in the test design
(3) that jointly conclude to a consequent particularly quantified description of a produced (predicted) test-outcome event
(4) that is compared with the observed test-outcome description.
In order to express explicitly the dependency of the produced effect upon the realized initial conditions in an empirical test, the universally quantified theory statements can be schematized as a nontruth-functional hypothetical-conditional schema, i.e., as a statement with the logical form “For every A if A, then C.”
This hypothetical-conditional schema “For every A if A, then C.” represents a system of one or several universally quantified related theory statements or equations that describe a dependency of the occurrence of events described by “C” upon the occurrence of events described by “A”. In some cases the dependency is expressed as a bounded stochastic density function for the values of predicted probabilities. For advocates who believe in the theory, the hypothetical-conditional schema is the theory-language context that contributes meaning parts to the complex semantics of the theory’s constituent descriptive terms including the terms common to the theory and test design. But the theory’s semantical contribution cannot be operative in a test for the test to be independent of the theory, since the test outcome is not true by definition; it is empirically contingent.
The antecedent “A” includes the set of universally quantified statements of test design that describe the initial conditions that must be realized for execution of an empirical test of the theory including the statements describing the procedures needed for their realization. These statements constituting “A” are always presumed to be true or the test design is rejected as invalid, as is any test made with it. The test-design statements are semantical rules that contribute meaning parts to the complex semantics of the terms common to theory and test design, and do so independently of the theory’s semantical contributions. The universal logical quantification indicates that any execution of the test is but one of an indefinitely large number of possible test executions, whether or not the test is repeatable at will.
When the test is executed, the logical quantification of “A” is changed from universal to particular quantification to describe the realized initial conditions in the individual test execution. When the universally quantified test-design and test-outcome statements have their logical quantification changed to particular quantification, the belief status and thus definitional rôle of the universally quantified test-design confer upon their particularly quantified versions the status of “fact” for all who decided to accept the test design. The theory statements in the hypothetical-conditional schema are also given particular quantification for the test execution. In a mathematically expressed theory the test execution consists in measurement actions and assignment of the resulting measurement values to the variables in “A”. In a mathematically expressed single-equation theory, “A” includes the independent variables in the equation of the theory and the test procedure. In a multi-equation system whether recursively structured or simultaneous, all the exogenous variables are assigned values by measurement, and are included in “A”. In longitudinal models with dated variables the lagged-values of endogenous variables that are the initial condition for a test and that initiate the recursion through successive iterations to generate predictions, must also be in “A”.
The consequent “C” represents the set of universally quantified statements of the theory that correctly predict the outcome of every correct execution of a test design. Its logical quantification is changed from universal to particular quantification to describe the predicted outcome for the individual test execution. In a mathematically expressed single-equation theory the dependent variable of the theory’s equation is in “C”. When no value is assigned to any variable, the equation is universally quantified. When the predicted value of a dependent variable is calculated from the measurement values of the independent variables, it is particularly quantified. In a multi-equation theory, whether recursively structured or a simultaneous-equation system, the solution values for all the endogenous variables are included in “C”. In longitudinal models with dated variables the current-dated values of endogenous variables for each iteration of the model, which are calculated by solving the model through successive iterations, are included in “C”.
The conditional statement of theory does not say “For every A and for every C if A, then C”. It only says “For every A if A, then C”. In other words the conditional statement of theory only expresses a sufficient condition for the correct prediction made in C upon realization of the test conditions described in “A”, and not a necessary condition. This occurs if scientific pluralism (See below, Section 4.20) occasions multiple theories proposing alternative causal factors for the same outcome predicted correctly in “C”. Or if there are equivalent measurement procedures or instruments described in “A” that produce alternative measurements with each having values falling within the range of the other’s measurement error.
Let another particularly quantified statement denoted “O” describe the observed test outcome of an individual test execution. The report of the test outcome “O” shares vocabulary with the prediction statements in “C”. But the semantics of the terms in “O” is determined exclusively by the universally quantified test-design statements rather than by the statements of the theory, and thus for the test its semantics is independent of the theory’s semantical contribution. In an individual test execution “O” represents observations and/or measurements made and measurement values assigned apart from the prediction in “C”, and it too has particular logical quantification to describe the observed outcome resulting from the individual execution of the test. There are three possible outcome scenarios:
Scenario I: If “A” is false in an individual test execution, then regardless of the truth of “C” the test execution is simply invalid due to a scientist’s failure to comply with the agreed test design, and the empirical adequacy of the theory remains unaffected and unknown. The empirical test is conclusive only if it is executed in accordance with its test design. Contrary to the logical positivists, the truth table for the truth-functional logic is therefore not applicable to testing in empirical science, because in science a false antecedent, “A”, does not make the hypothetical-conditional statement true by logic of the test.
Scenario II: If “A” is true and the consequent “C” is false, as when the theory conclusively makes erroneous predictions, then the theory is falsified, because the hypothetical conditional “For every A if A, then C” is false. Falsification occurs when the prediction statements in “C” and the observation reports in “O” are not accepted as describing the same thing within the range of vagueness and/or measurement error, which are manifestations of empirical under-determination. The falsifying logic of the test is the modus tollens argument form, according to which the conditional-hypothetical schema expressing the theory is falsified, when one affirms the antecedent clause and denies the consequent clause. This is the falsificationist philosophy of scientific criticism advanced by Peirce, the founder of classical pragmatism, and later advocated by Popper.
For more on Popper readers are referred to BOOK V at the free web site www.philsci.com and in the e-book Twentieth-Century Philosophy of Science: A History, which is available from most Internet booksellers.
The response to a conclusive falsification may or may not be attempts to develop a new theory. Responsible scientists will not deny a falsifying outcome of a test, so long as they accept its test design and test execution. Characterization of falsifying anomalous cases is informative, because it contributes to articulation of a new problem that a new and more empirically adequate theory must solve. Some scientists may, as Kuhn said, simply believe that the anomalous outcome is an unsolved problem for the tested theory without attempting to develop a new theory. But such a response is either an ipso facto rejection of the tested theory, a de facto rejection of the test design or simply a disengagement from attempts to solve the new problem. And contrary to Kuhn this procrastinating response to anomaly need not imply that the falsified theory has been given institutional status, unless the science itself is institutionally retarded.
For more on Kuhn readers are referred to BOOK VI at the free web site www.philsci.com or in the e-book Twentieth-Century Philosophy of Science: A History, which is available from most Internet booksellers.
Scenario III: If “A” and “C” are both true, then the hypothetical-conditional schema expressing the tested theory is validly accepted as asserting a causal dependency between the phenomena described by the antecedent and consequent clauses. The nontruth-functional hypothetical-conditional statement does not merely assert a Humean psychological constant conjunction. Causality is an ontological category describing a real dependency, and the causal claim is asserted on the basis of ontological relativity due to the empirical adequacy demonstrated by the nonfalsifying test outcome. Because the nontruth-functional hypothetical-conditional statement is empirical, causality claims are always subject to future testing, falsification, and then revision. This is also true when the conditional represents a mathematical function.
But if the test design is afterwards modified such that it changes the characterization of the subject of the theory, then a previous nonfalsifying test outcome should be reconsidered and the theory should be retested for the new definition of the subject. If the retesting produces a falsifying outcome, then the new information in the modification of the test design has made the terms common to the two test designs equivocal and has contributed parts to alternative meanings. But if the test outcome is not falsification, then the new information is merely new parts added to the meaning of the univocal terms common to the old and new test-design description. Such would be the case for a new and additional way to measure temperature for extreme values that cannot be measured by the old measurement procedure, but which yields the same temperature values within the range of measurement errors, where the alternative procedures produce overlapping results.
On the contemporary pragmatist philosophy a theory that has been tested is no longer theory, once the test outcome is known and the test execution is accepted as correct. If the theory has been falsified, it is merely rejected language unless the falsified theory is still useful for the lesser truth it contains. But if it has been tested with a nonfalsifying test outcome, then it is empirically warranted and thus deemed a scientific law until it is later tested again and falsified. The law is still hypothetical because it is empirical, but it is less hypothetical than it had previously been as a theory proposed for testing. The law may thereafter be used either in an explanation or in a test design for testing some other theory.
For example the elaborate engineering documentation for the Large Hadron Collider at CERN, the Conseil Européen pour la Recherche Nucléaire, is based on previously tested science. After installation of the collider is complete and it is known to function successfully, the science in that engineering is not what is tested when the particle accelerator is operated for the microphysical experiments, but rather the employed science is presumed true and contributes to the test design semantics for experiments performed with the accelerator.
4.16 Test Logic Illustrated
Consider the simple heuristic case of Gay-Lussac’s law for a fixed amount of gas in an enclosed container as a theory proposed for testing. The container’s volume is constant throughout the experimental test, and therefore is not represented by a variable. The theory is (T'/T)*P = P', where the variable P means gas pressure, the variable T means the gas temperature, and the variables T' and P' are incremented values for T and P in a controlled experimental test, where T' = T ± ΔT, and P' is the predicted outcome that is produced by execution of the test design.
The statement of the theory may be schematized in the nontruth-functional hypothetical-conditional form “For every A if A, then C”, where “A” includes (T'/T)*P, and “C” states the calculated prediction value of P', when temperature is incremented by ΔT from T to T'. The theory is universally quantified, and thus claims to be true for every execution of the experimental test. And for proponents of the theory, who are believers in the theory, the semantics of T, P, T' and P' are mutually contributing to the semantics of each other, a fact exhibited explicitly in this case, because the equation is monotonic, such that each variable can be expressed as a mathematical function of all the others by simple algebraic transformations.
“A” also includes the universally quantified test-design statements. These statements describe the experimental set up, the procedures for executing the test and initial conditions to be realized for execution of a test. They include description of the equipment used including the container, the heat source, the instrumentation used to measure the magnitudes of heat and pressure, and the units of measurement for the magnitudes involved, namely the pressure units in atmospheres and the temperature units in degrees Kelvin (K°). And they describe the procedure for executing the repeatable experiment. This test-design language is also universally quantified and thus also contributes meaning components to the semantics of the variables P, T and T' in “A” for all interested scientists who accept the test design.
The procedure for performing the experiment must be executed as described in the test-design language, in order for the test to be valid. The procedure will include firstly measuring and recording the initial values of T and P. For example let T = 200°K and P = 1.6 atmospheres. Let the incremented measurement value be recorded as ΔT = 200°K, so that the measurement value for T' is made to be 400°K. The description of the execution of the procedure and the recorded magnitudes are expressed in particularly quantified test-design language for this particular test execution. The value of P' is then calculated.
The test outcome consists of measuring and recording the resulting observed incremented value for pressure. Let this outcome be represented by particularly quantified statement O using the same vocabulary as in the test design. But only the universally quantified test-design statements define the semantics of O, so that the test is independent of the theory. In this simple experiment one can simply denote the measured value for the resulting observed pressure by the variable O. The test execution would also likely be repeated to enable estimation of the range of measurement error in T, T', P and O, and the measurement error propagated into P' by calculation. A mean average of the measurement values from repeated executions would be calculated for each of these variables. Deviations from the mean are estimates of the amounts of measurement error, and statistical standard deviations could summarize the dispersion of measurement errors about the mean averages.
The mean average of the test-outcome measurements for O is compared to the mean average of the predicted measurements for P' to determine the test outcome. If the values of P' and O are equivalent within their estimated ranges of measurement error, i.e., are sufficiently close to 3.2 atmospheres as to be within the measurement errors, then the theory is deemed not to have been falsified. After repetitions with more extreme incremented values with no falsifying outcome, the theory will likely be deemed sufficiently warranted empirically to be called a law, as it is today.
4.17 Semantics of Empirical Testing
Much has already been said about the artifactual character of semantics, about componential semantics, and about semantical rules. In the semantical discussion that follows, these concepts are brought to bear upon the discussion of the semantics of empirical testing and of test outcomes.
The ordinary semantics of empirical testing is as follows:
If a test has a nonfalsifying outcome, then for the theory’s developer and its advocates the semantics of the tested theory is unchanged.
Since they had proposed the theory in the belief that it would not be falsified, their belief in the theory makes it function for them as a set of semantical rules. Thus for them both the theory and the test design are accepted as true, and after the nonfalsifying test outcome both the theory and test-design statements continue to contribute parts to the complex meanings of the descriptive terms common to both theory and test design, as before the test.
But if the test outcome is a falsification, then there is a semantical change produced in the theory for the developer and the advocates of the tested theory who accept the test outcome as a falsification.
The unchallenged test-design statements continue to contribute semantics to the terms common to the theory and test design by contributing their parts to the meaning complexes of each of those common terms. But the component parts of those meanings contributed by the falsified theory statements are excluded from the semantics of those common terms for the proponents who no longer believe in the theory due to the falsifying test, because the falsified theory statements are no longer deemed to be semantical rules.
4.18 Test Design Revision
Empirical tests are conclusive decision procedures only for scientists who agree on which language is proposed theory and which language is presumed test design, and who furthermore accept both the test design and the test-execution outcomes produced with the accepted test design.
The decidability of empirical testing is not absolute. Popper had recognized that the statements reporting the observed test outcome, which he called “basic statements”, require agreement by the cognizant scientists, and that those basic statements are subject to future reconsideration.
All universally quantified statements are hypothetical, but theory statements are relatively more hypothetical than test-design statements, because the interested scientists agree that in the event of a falsifying test outcome, revision of the theory will likely be more productive than revision of the test design.
But a dissenting scientist who does not accept a falsifying test outcome of a theory has either rejected the report of the observed test outcome or reconsidered the test design. If he has rejected the outcome of the individual test execution, he has merely questioned whether or not the test was executed in compliance with its agreed test design. Independent repetition of the test with conscientious fidelity to the design may answer such a challenge to the test’s validity one way or the other.
But if in response to a falsifying test outcome the dissenting scientist has reconsidered the test design itself, he has thereby changed the semantics involved in the test in a fundamental way. Such reconsideration amounts to rejecting the design as if it was falsified, and letting the theory define the subject of the test and the problem under investigation – a rôle reversal in the pragmatics of test-design language and theory language that makes the original test design and the falsifying test execution irrelevant.
In his “Truth, Rationality, and the Growth of Knowledge” (1961) reprinted in Conjectures and Refutations (1963) Popper rejects such a dissenting response to a test, calling it a “content-decreasing stratagem”. He admonishes that the fundamental maxim of every critical discussion is that one should “stick to the problem”. But as Conant recognized to his dismay in his On Understanding Science: An Historical Approach (1947) the history of science is replete with such prejudicial responses to scientific evidence that have nevertheless been productive and strategic to the advancement of basic science in historically important episodes. The prejudicially dissenting scientists may decide that the design for the falsifying test supplied an inadequate description of the problem that the tested theory is intended to solve, often if he developed the theory himself and did not develop the test design. The semantical change produced for such a recalcitrant believer in the theory affects the meanings of the terms common to the theory and test-design statements. The parts of the meaning complex that had been contributed by the rejected test-design statements are parts that are excluded from the semantics of one or several of the descriptive terms common to the theory and test-design statements. Such a semantical outcome can indeed be said to be “content decreasing”, as Popper said.
But a scientist’s prejudiced or “tenacious” (per Feyerabend) rejection of an apparently falsifying test outcome may have a contributing function in the development of science. It may function as what Feyerabend called a “detecting device”, a practice he called “counterinduction”, which is a strategy that he illustrated in his examination of Galileo’s arguments for the Copernican cosmology. Galileo used the apparently falsified heliocentric theory as a “detecting device” by letting his prejudicial belief in the heliocentric theory control the semantics of the apparently falsifying observational description. This enabled Galileo to reinterpret observations previously described with the equally prejudiced alternative semantics built into the Aristotelian geocentric cosmology. Counterinduction was also the strategy used by Heisenberg, when he reinterpreted the observational description of the electron track in the Wilson cloud chamber using Einstein’s aphorism that the theory decides what the physicist can observe, and Heisenberg reports that he then developed his indeterminacy relations using his matrix-mechanics quantum concepts.
Another historic example of using an apparently falsified theory as a detecting device is the discovery of the planet Neptune. In 1821, when Uranus happened to pass Neptune in its orbit – an alignment that had not occurred since 1649 and was not to occur again until 1993 – Alexis Bouvard (1767-1843) developed calculations predicting future positions of the planet Uranus using Newton’s celestial mechanics. But observations of Uranus showed significant deviations from the predicted positions.
A first possible response would have been to dismiss the deviations as measurement errors and preserve belief in Newton’s celestial mechanics. But astronomical measurements are repeatable, and the deviations were large enough that they were not dismissed as observational errors. The deviations were recognized to have presented a new problem.
A second possible response would have been to give Newton’s celestial mechanics the hypothetical status of a theory, to view Newton’s law of gravitation as falsified by the anomalous observations of Uranus, and then to attempt to revise Newtonian celestial mechanics. But by then confidence in Newtonian celestial mechanics was very high, and no alternative to Newton’s physics had yet been proposed. Therefore there was great reluctance to reject Newtonian physics.
A third possible response, which was historically taken, was to preserve belief in the Newtonian celestial mechanics, to modify the test-design language by proposing a new auxiliary hypothesis of a gravitationally disturbing planet, and then to reinterpret the observations by supplementing the description of the deviations using the auxiliary hypothesis. Disturbing phenomena can “contaminate” even supposedly controlled laboratory experiments. The auxiliary hypothesis changed the semantics of the test-design description with respect to what was observed.
In 1845 both John Couch Adams (1819-1892) in England and Urbain Le Verrier (1811-1877) in France independently using apparently falsified Newtonian physics as a detecting device made calculations of the positions of a disturbing postulated planet to guide future observations in order to detect the postulated disturbing body by telescope. On 23 September 1846 using Le Verrier’s calculations Johann Galle (1812-1910) observed the postulated planet with the telescope of the Royal Observatory in Berlin.
Theory is language proposed for testing, and test design is language presumed for testing. But here the pragmatics of the discourses was reversed. In this third response the Newtonian gravitation law was not deemed a tested and falsified theory, but rather was presumed to be true and used for a new test design. The new test-design language was actually given the relatively more hypothetical status of theory by the auxiliary hypothesis of the postulated planet thus newly characterizing the observed deviations in the positions of Uranus. The nonfalsifying test outcome of this new hypothesis was Galle’s observational detection of the postulated planet, which Le Verrier had named Neptune.
But counterinduction is after all just a strategy, and it is more an exceptional practice than the routine one. Le Verrier’s counterinduction strategy failed to explain a deviant motion of the planet Mercury when its orbit comes closest to the sun, a deviation known as its perihelion precession. In 1843 Le Verrier presumed to postulate a gravitationally disturbing planet that he named Vulcan and predicted its orbital positions. However unlike Le Verrier, Einstein had given Newton’s celestial mechanics the more hypothetical status of theory language, and he viewed Newton’s law of gravitation as having been falsified by the anomalous perihelion precession. He had initially attempted a revision of Newtonian celestial mechanics by generalizing on his special theory of relativity. This first such attempt is known as his Entwurf version, which he developed in 1913 in collaboration with his mathematician friend Marcel Grossman. But working in collaboration with his friend Michele Besso he found that the Entwurf version had clearly failed to account accurately for Mercury’s orbital deviations; it showed only 18 seconds of arc per century instead of the actual 43 seconds.
In 1915 he finally abandoned the Entwurf version, and under prodding from the mathematician David Hilbert (1862-1943) he turned to mathematics exclusively to produce his general theory of relativity. He then developed his general theory, and announced his correct prediction of the deviations in Mercury’s orbit to the Prussian Academy of Sciences on 18 November 1915. He received a congratulating letter from Hilbert on “conquering” the perihelion motion of Mercury. After years of delay due to World War I his general theory was further vindicated by Arthur Eddington’s (1888-1944) historic eclipse test of 1919. Some astronomers reported that they had observed a transit of a planet across the sun’s disk, but these claims were found to be spurious when larger telescopes were used, and Le Verrier’s postulated planet Vulcan has never been observed. MIT professor Thomas Levenson relates the history of the futile search for Vulcan in his The Hunt for Vulcan (2015).
Le Verrier’s response to Uranus’ deviant orbital observations was the opposite to Einstein’s response to the deviant orbital observations of Mercury. Le Verrier reversed the rôles of theory and test-design language by preserving his belief in Newton’s physics and using it to revise the test-design language with his postulate of a disturbing planet. Einstein viewed Newton’s celestial mechanics to be hypothetical, because he believed that the Newtonian theory statements were more likely to be productively revised than test-design statements, and he took the anomalous orbital observations of Mercury to falsify Newton’s physics, thus indicating that theory revision was needed. Empirical tests are conclusive decision procedures only for scientists who agree on which language is proposed theory and which is presumed test design, and who furthermore accept both the test design and the test-execution outcomes produced with the accepted test design.
For more about Feyerabend on counterinduction readers are referred to BOOK VI at the free web site www.philsci.com or in the e-book Twentieth-Century Philosophy of Science: A History, which is available from most Internet booksellers.
Finally there are more routine cases of test design revision that do not occasion counterinduction. In such cases there is no rôle reversal in the pragmatics of theory and test design, but there may be an equivocating revision in the test-design semantics depending on the test outcome due to a new observational technique or instrumentality, which may have originated in what Feyerabend called “auxiliary sciences”, e.g., development of a superior microscope or telescope. If retesting a previously nonfalsified theory with the new test design with the new observational technique or instrumentality does not produce a falsifying outcome, then the result is merely a refinement of the semantics in the test-design language. But if the new test design occasions a falsification, then it has produced a semantical equivocation between the statements of the old and new test-designs, and has redefined the subject of the tested theory.