Financial Mathematics Text

Monday, September 3, 2012

The Problem of Induction

This is a continuation of a series devoted to the question of what an "empirical question" is. See Part I and Part II if you're interested.

One tool that is considered essential to the empirical sciences is induction. It's importance is, in my opinion, overstated by most philosophers (I firmly believe that abduction rarely gets the proper credit it deserves). In spite of that, I do not dismiss its important role in scientific inquiry.

Broadly speaking I will take induction as a process by which one infers features, properties, relationships, etc about a population from features, properties, relationships about a sample.

I have some general comments about this definition. I referred to this as a "process" although in some cases there may be no such process. For example I may simply observe that in my sample of swans that they are all white and then conclude that the population of swans are also all white.

I have used the terms "sample" and "population" which I have borrowed from statistics. I consider statistics to be the primary set of inductive techniques employed by the sciences. It allows me to consider such propositions as "the mean life expectancy of adult males is 73 years" which are not the characteristic "All S are P" statements that are generally regarded as conclusions of inductions.

The "problem of induction" is actually a set of problems. Many of them take the following form:

Consider a population set we'll call U and a sample (which is a subset of U) we'll call S. Now let's suppose that there is a relationship that seems to hold for S. Why ought we believe it also apply to U?

Here's an example to illustrate. Suppose that a coin is tossed 32 times (30+ is generally regarded as a good sample size and 32 makes my table pretty). I had EXCEL mimic the process:

Suppose we play a game in which if you select the outcome correctly you receive $ \$1 $ but if you don't you have to pay $ \$1$. Can you develop a winning strategy (one in which the expected value is greater than $ \$0$)?

Based upon this data set I have developed three strategies:
1) Always pick TAILS: Success rate of 59.4%
2) Pick TAILS unless the toss is a prime number: Success rate of 62.5%
3) Pick TAILS when the toss is even, pick HEADS when the toss is odd: Success rate of 65.6%

These are likely not the only strategies (you may be able to find infinitely many more). These are features of this particular data set; these strategies "work" on this data set. Yet, what reason do we have to believe that any of these strategies will work on any other data set?

This illustrates two features of the problem of induction. The first is that properties of one sample from a population may not apply to any other sample (or the population as a whole). The second is that the sample gives "evidence" for many different strategies. In the above example, I provided 3 strategies that would enable you to guess correctly more often than not but it's not clear which (if any) will apply to future data sets.

Green or Grue?

This latter feature of the problem of induction is exemplified by the property "grue". An object is said to be grue if it is green prior to time t and blue after time t. Time t is usually some specified date (in the future). The idea behind this example is this.

Suppose that I have a sample of emeralds. I observe all of these emeralds and conclude that they are all green. But since we are before time t, it follows that they are also may be "grue". So what will happen after time t? Will the emeralds still be green or will they be blue? The properties of "greenness" and "grueness" both apply to our data set. Will they still apply in the future? Will they apply to other data sets?

It should be clear that our data set here is insufficient to answer these questions. I think there are at least two conclusions that are warranted here:

1) Since we are simply dealing with one data set, we have no reason to believe that it will apply to any other.
2) Since we do not have any other data, we have no reason to rule our "greenness" nor "grueness".

Like in the case of my EXCEL coin toss, I found three strategies that worked on that particular data set. Looking at other data sets may help me to rule out some of them but I do not have other data sets. So all of those strategies, as far as my information tells me, are perfectly consistent with the data. (It also happens to be consistent with the hypothesis that the coin is "fair" and that no strategy would give me an advantage).

And given that any data set has a potentially infinite number of possible descriptions that can be given to it, that puts a significant limitation on how much we can infer from induction. Just as I can come up with many more profitable strategies for the coin toss, I can come up with many more properties for emeralds (perhaps they are grellow?).

So where does that leave us?

If this is damaging for induction, what does that mean for "empirical sciences"? The thing that gives this problem so much attention, in my opinion, is that it appears to question the "validity of the sciences" and as a result, a response is required. Many such responses attempt to refute the criticisms. Others lead to defeat. My own take is the presumption that induction is the cornerstone to science. Or rather, induction is the only tool available for empirical questions. I conclude that this is not the case.

I acknowledge that there are severe limitations to induction. But there are at least two avenues available to us.

First, we have the avenue of acquiring other data sets; we are not restricted to just one. In the case of my coin toss experiment, I could repeat the experiment and see what, if any, of the strategies I have come up with will apply to other data sets.

This approach has been used to study stock market trading strategies. If I look at a price series for stocks, of course I can find strategies that will work on that data set. But how do I know if it will apply to a different data set? One research technique is to develop strategies on one data set and then test them on a different data set. This helps to rule out strategies that are anomalies of a particular data set.

The second avenue is to acknowledge that there are other tools available to us. The particular tool we have is abduction. In particular, we may have theoretical reasons for choosing induction over another.

Let's return to our emeralds. Are emeralds green or grue (or for that matter neither)?

Just as we can acquire other data sets, we can utilize other sources of knowledge. After all, we can acquire information about the physical structure of emeralds and we can acquire information on how light interacts with different structures. From this we could develop a theory (e.g. a systematic set of hypotheses) which, if granted true, would explain our inductive observations.

This can easily apply to other situations. In the case of our coin toss experiment, we already have good information on how the coin works (I used EXCEL's random number generator). Since we understand how that works, we have every reason to believe that no strategies can be developed which will give one an advantage, in spite of the fact that we may be able to find some that work on particular data sets.

In closing, let's consider two additional "inductive arguments".

1) In the past, the sun has come up every day. Therefore, the sun will come up every day in the future.
2) In the past, every day I have not died. Therefore, I will not die any die in the future.

To be clear, I don't think these are the greatest inductive arguments. I prefer having a known population and then taking a random sample from that population. Our sample is a specific sample from all days, not a random sample. That's a clear deficiency. But this is a common type of inductive argument (whether you want to count it as "valid" or not I will leave to you.)

Is my knowledge that the sun will come up or that I will not die dependent solely on these inductive arguments or is there other sources of knowledge that evidences (or denies) these conclusions?




No comments:

Post a Comment

Some common OpenID URLs (no change to URL required):
Google: https://www.google.com/accounts/o8/id
Yahoo: http://me.yahoo.com/