Hypothesis Testing#

Inferential Statistics#

Sometimes, you may require a very large amount of data for your analysis which may need too much time and resources to acquire. In such situations, you are forced to work with a smaller sample of the data, instead of having the entire data to work with.

Situations like these arise all the time at big companies like Amazon. For example, say the Amazon QC department wants to know what proportion of the products in its warehouses are defective. Instead of going through all of its products (which would be a lot!), the Amazon QC team can just check a small sample of 1,000 products and then find, for this sample, the defect rate (i.e. the proportion of defective products). Then, based on this sample’s defect rate, the team can “infer” what the defect rate is for all the products in the warehouses. This process of “inferring” insights from sample data is called “Inferential Statistics”.


Hypothesis Testing#

Hypothesis testing is used to confirm your conclusions about the population parameter. Through this we can conclude if there is enough evidence to confirm the hypothesis about the population.

  • \(H_0 =\) null hypothesis, what is already present; always has the following signs: = OR ≤ OR ≥

  • \(H_1 =\) alternate hypothesis, a challenge to the null hypothesis; always has the following signs: ≠ OR > OR <

Steps#

📖Source

  • There is an initial research hypothesis of which the truth is unknown.

  • The first step is to state the relevant null and alternative hypotheses. This is important, as mis-stating the hypotheses will muddy the rest of the process.

  • The second step is to consider the statistical assumptions being made about the sample in doing the test; for example, assumptions about the statistical independence or about the form of the distributions of the observations. This is equally important as invalid assumptions will mean that the results of the test are invalid.

  • Decide which test is appropriate, and state the relevant test statistic \(T\).

  • Derive the distribution of the test statistic under the null hypothesis from the assumptions. In standard cases this will be a well-known result. For example, the test statistic might follow a Student’s t distribution with known degrees of freedom, or a normal distribution with known mean and variance. If the distribution of the test statistic is completely fixed by the null hypothesis we call the hypothesis simple, otherwise it is called composite.

  • Select a significance level (\(\alpha\)), a probability threshold below which the null hypothesis will be rejected. Common values are \(5\%\) and \(1\%\).

  • The distribution of the test statistic under the null hypothesis partitions the possible values of T into those for which the null hypothesis is rejected—the so-called critical region—and those for which it is not. The probability of the critical region is \(\alpha\). In the case of a composite null hypothesis, the maximal probability of the critical region is \(\alpha\).

  • Compute from the observations the observed value \(t_{obs}\) of the test statistic \(T\).

  • Decide to either reject the null hypothesis in favor of the alternative or not reject it. The decision rule is to reject the null hypothesis \(H_0\) if the observed value \(t_{obs}\) is in the critical region, and not to reject the null hypothesis otherwise.

A common alternative formulation of this process goes as follows:

  • Compute from the observations the observed value tobs of the test statistic \(T\).

  • Calculate the \(p\)-value. This is the probability, under the null hypothesis, of sampling a test statistic at least as extreme as that which was observed (the maximal probability of that event, if the hypothesis is composite).

  • Reject the null hypothesis, in favor of the alternative hypothesis, if and only if the \(p\)-value is less than (or equal to) the significance level (the selected probability) threshold (\(\alpha\)), for example \(0.05\) or \(0.01\).

The former process was advantageous in the past when only tables of test statistics at common probability thresholds were available. It allowed a decision to be made without the calculation of a probability. It was adequate for classwork and for operational use, but it was deficient for reporting results. The latter process relied on extensive tables or on computational support not always available. The explicit calculation of a probability is useful for reporting. The calculations are now trivially performed with appropriate software.

The difference in the two processes applied to the Radioactive suitcase example (below):

  • “The Geiger-counter reading is \(10\). The limit is \(9\). Check the suitcase.”

  • “The Geiger-counter reading is high; \(97\%\) of safe suitcases have lower readings. The limit is \(95\%\). Check the suitcase.”

The former report is adequate, the latter gives a more detailed explanation of the data and the reason why the suitcase is being checked.

Example#

A manufacturer claims that the average life of its products is \(36\) months. An auditor selects a sample of \(49\) units of the product and calculates the average life to be \(34.5\) months. The population standard deviation is \(4\) months. Test the manufacturer’s claim at \(3\%\) significance level.

So as per the above problem:

  • \(H_0 : \mu = 36\) months

  • \(H_1 : \mu ≠ 36\) months

\(\alpha = 3\%\) or \(0.03\) \(\sigma = 4\) \(N = 49\)

First we will take a look into the Critical-value method:

In this case we will have critical region at both sides with total area of \(0.03\).

So the area to the right = \(0.015\), which means that area till UCV (cumulative probability till that point) \(= 1-0.015=0.985\)

\(z\)-score of cumulative probability of UCV (\(Z_c\) in this case) \(= z\)-score of \(0.985 = 2.17\)

Calculating the critical values UCV/LCV \(= \mu \pm Z_c * \frac{\sigma}{\sqrt{N}} \)

UCV/LCV \(= 36 \pm 2.17 * \frac{4}{\sqrt{49}} = 37.24 \text{ and } 34.76\)

Now as the sample mean \(34.5\) is not between UCV and LCV hence we reject the null hypothesis.

../../_images/image101.PNG

Now let’s solve it using the \(p\)-value method:

Calculate the \(z\)-score of \(34.5 = \frac{\bar{x} -\mu}{\frac{\sigma}{\sqrt{N}}} = -2.62\)

Calculate the p-value from table \(= 0.0044\), which is cumulative area till sample point$

As this it is in the left hand side hence there is no need to substract from \(1\).

Now this will be a 2-tailed test as we are checking for inequality, we need to multiply by \(2 = 0.0088\), remember 1-tailed test provides more power to detect an effect because the entire weight is allocated to one direction only.

As \(p\)-value is \(< \alpha\) so we reject null-hypothesis.

Errors in Hypothesis Testing#

../../_images/image113.PNG

Types of Errors in Hypothesis testing (📖Source)#


Questions#