Statistics

=Probability= Statistical probability gives the scientist a form of power thorugh which he can predict future events that dramatically affect our lives. An 90% chance of rain leads us to take an umbrella. Probability is a way of measuring uncertainty. THe idea of measuring uncertainty is if an experiment is performed many times, what is the fractions of outcomes. A large sample set is required according to the Law of Lae Numbers.

Experiment: handout 100 pennies to class, see number of heads and tails.

-A woman says she has two children. What is the probabiity of having two boys? Two questions with two different outcomes:
 * The Important of Asking the Right Question**

Asking the right question cam improve your chances of determining the correct outcome.
 * = # ||= Question ||= Response ||= Probability of two boys ||
 * = 1. ||= Is one of them a boy? ||= Yes ||= 1/3 ||
 * = 2. ||= Is the older one a boy? ||= Yes ||= 1/2 ||

1. FERMAT GAME. Roll a die 4 times/. If you get a 6, you win. Chances of getting a six (using principle of opposites): 1- (possibilities of missing) = 1 - (5/6)^4 > 50%
 * Games**

2. Roll two die 24 times. f roll double six, other guy wins. Chances of getting double six is 1-(35/36)^24 < 50%

Equally Likely Outcomes
P(E) = Number of outcomes where E occurs/Number of possible outcomes

Roll a die P(1) = 1/6 P(even) = 3/6 P(X >4) = 2/6 Number of outcomes from flipping coin n times is 2^n

Independence
Two events do not affect one another, i.e. flip coin and roll die.

P(A AND B) = (PA) * P(B) P(flip head and roll 3) = 1/2 * 1/6 P(roll 3, 5 times in a row) = (1/6)^5 P(roll, 3, 1, 2, 1, 5) = (1/6)^5 P(1, 2, 3, 4, 5 any order) %! (1/6)^5 since now increasing the number of outcomes, i.e. 1, 2, 3, 4, 5; 5,3,4,2,1; 2,3,4,1,5 ...

Binomial and geometric random varibles
The binomial probability is the same as the Bernoulli probability function.

Dependence
P(A|B) = P( AND B)/P(B) the probability of A given the probability of B

P(X is even |X>=4) the probability a fol is even given the expected value (roll) is greater or equal o 4 = P(2 0r 6)/P(X>=4) = (2/6)/(3/6) = 2/3

Expected Value
The expected value X is the sum of all probabilities for each outcome

If X could be 0, 1 or 2 and the probabilities for each are 1/6, 1/2, 1/6, then E{X] = 0(1/6) + 1(1/2) + 2(1/6) = 2/3

Expected E[X} values measure the average value of X. The variance meausre the spread of X. Var (X) = E[(X-mean)^2]. Variance is the "expected squared distance from the mean." To get the same units as the expected value, we take the square root which is the standard deviation.

Handy Formula for Variance
Var(X) = E[X^2]-E{X}^2

If X could be 0, 1 or 2 and the probabilities for each are 1/6, 1/2, 1/6, then

E[X^2} = 0^2(1/6) + 1^2(1/2) + 2^2(1/6) = 1 E{X] = 0(1/6) + 1(1/2) + 2(1/6) = 2/3 Var(X) = E[X^2]-E{X}^2 = 1 - (2/3)^2 = 5/9

Finally, most natural phenomena have a normal distribution which allows us to predict the mean, a 68% confidence within 1 S.D. and a 95% confidence within 2 S.D from the mean.

After flipping a coin 1000 times, we can expect head about 50 times, and a variance a quarter of the time (25). The S.D. is therefore 5. This means we have 68% confidence we'll have heads between 45 and 55, and a confidence of 95% we'll be between 40 and 60 heads..

Law of Total Probability
For n number of possible outcomes, B_1, B_2...B_n, P(A) = PA|B_1)*P(B_1) = ... P(A|B_n)*P(B_N)

If school is cancelled when it snows, then P(cancels) = P(cancel | snows) *P(snows) + P(cancel|no snow)*R(no snow)

The school says the chances of cancelling school given it snows is 70%. The chances of school cancelling without snow is 0.05 The weather man says theres a 20% chance of snow. We can calculate the chance of school being cancelled by taking a //weighted average//.

P(cancel) = .70*.20 + .0.05*.80 = .14 + .04 = .18

An 18% chance of school canceling.

This section is meant to remind the user on fundamentals statistical tests.

The **null hypothesis** suggests that the observed effect results of random variation between the sample and the population, and thus there is no difference between the means. The **alternative hypothesis** suggests that the indicated effect is real. The null hypothesis is the initial claim based on research or common knowledge. The alternative hypothesis is what you believe to be true. Basically, the "desired result" should be associated with the alternate hypothesis, which is the only vehicle for allowing a true result. The objective is to test the null hypothesis. You assume the null hypothesis is true and test the probability of it's likeliness. If the probability of likelihood is less than the assigned alpha level (i.e. 5%), you can reject the null hypothesis and claim the alternative hypothesis as true. If the null hypothesis is not rejected (p-value greater than alpha), it assumes that the null hypothesis //may// be true.

Common terminology
Response: measured or observed values, dependent variables Factor: categorical (e.g specific temperatures) or continuous (e.g. temperature range) factors that affect a response. Residuals: distance from a regression line

See below for examples:



"Defects" is the response that varies freely and is dependent. "Factory" is one factor, an independent variable. Notice with this one factor (single color), three groups (A, B, C) are compared.



Again "Defects" is the response that varies freely and is dependent. Now, two factors are considered, "Factory" and "Year". Within "Factory" are three groups (A, B, C). Within the factor "Year" are three groups (1994, 1995, 1996).

See this [|site] for graphing independent and dependent variables.

One-Way ANOVA
A One-Way ANOVA can be used to determine a statistical difference between the means of two or more groups within a single factor. It is mostly used to assess the difference in variance between two groups. t-tests are another common technique to assess difference between means.

code Example: A student has measured the fractal dimensional increment D* (response) between two types or "groups" of ceramics (factor) - monoliths and laminates. He is interested to see if there is a statistical difference between the average D* for each type.

response (dependent variable): D* factor (independent variable): ceramic (monoliths, laminates) groups: monoliths, laminates

code

N-Way ANOVA
An N-Way ANOVA is used to assess statistical difference between two or more factors.

code Example: From the example above...

response (dependent variable): D* factor 1 (independent variable 1): system (monoliths, laminates) factor 2 (independent variable 2): fracture origin (mirror radius, initiated crack) group 1: monoliths, laminates group 2: mirror radius, initiated crack code

Post-hoc Analysis
After ANOVA has been performed, further comparative analyses can be performed to determine which groups are //statistically most different// including **Scheffe's Test, Fischer's LSD Test, Tukey's Test, Dunnett's and Hsu's MCB**.

Residual plots are used to check statistical assumptions. They include the following:
 * Normal probability plot: Straight line --> residuals are normally distributed
 * Histogram of residuals: symmetric bell-shape --> normal. Check for multiple peaks and outliers.
 * Residuals versus fitted values: randomly scattered across zero. Checks non-constant variance, missing higher order terms, outliers.
 * Residuals versus order: no clear pattern. Detects time dependence of variables.

Dropping needles of same length as parallel lines, the the probability of needles that cross is 2/pi.
 * Bufon's needle**

Person can only move either left or right. Any move away from starting must be counterbalanced in the opposite direction; there fore then must be an even number of coin flips. When heads=tails, the person returns home. With probabilistic certainty the person returns home. With probabilistic certain she will hit every point. Applies to a plane. It is not probabilistically certain she will return home in 3D due to more degrees of freedom. P =0.6...
 * Random Walk**