An outlier is an observation that appears
out of place in a sample. There may be several
reasons why outliers are present. It may be
an error in the data. It may be an unlikely but
random variation. Or it may be an indication
that some model assumptions are incorrect.
This section describes two tests for outliers
in normal samples.
Grubbs' test is the test of choice for a single outlier.
The test has one and two-sided versions. In the one-sided
version, the null hypothesis is that the smallest (one-tailed lower)
or largest (one-tailed upper) value in the sample is not an outlier.
The alternative hypothesis is that this value is an outlier.
In the two-sided version, the null hypothesis is that neither
the smallest or the largest value are outliers. The alternative
hypothesis is that either of these is an outlier.
Grubbs' test is implemented by the
GrubbsTest
class. It has 3 constructors.
The first constructor has no arguments. The sample and other test properties
must be set manually. A two-tailed test with a significance level of 0.05
is assumed by default.
The second constructor takes as its only argument a vector that contains
the sample that is to be tested for outliers. The third constructor
takes a second argument that specifies whether the test
is one or two-tailed.
The example below tests whether the largest value in a sample of
8 measurements from a mass spectrometer of a uranium is an outlier:
var grubbsSample = Vector.Create(199.31, 199.53,
200.19, 200.82, 201.92, 201.95, 202.18, 245.57);
var grubbs = new GrubbsTest(grubbsSample, HypothesisType.OneTailedUpper);
Console.WriteLine("Grubbs G: {0:F5}", grubbs.Statistic);
Console.WriteLine(" Crit.: {0:F5}", grubbs.GetUpperCriticalValue());
Console.WriteLine(" Reject:{0}", grubbs.Reject());
Dim grubbsSample = Vector.Create(199.31, 199.53,
200.19, 200.82, 201.92, 201.95, 202.18, 245.57)
Dim grubbs = New GrubbsTest(grubbsSample, HypothesisType.OneTailedUpper)
Console.WriteLine("Grubbs G: {0:F5}", grubbs.Statistic)
Console.WriteLine(" Crit.: {0:F5}", grubbs.GetUpperCriticalValue())
Console.WriteLine(" Reject:{0}", grubbs.Reject())
No code example is currently available or this language may not be supported.
let grubbsSample = Vector.Create([| 199.31; 199.53; 200.19;
200.82; 201.92; 201.95; 202.18; 245.57 |])
let grubbs = new GrubbsTest(grubbsSample, HypothesisType.OneTailedUpper)
printfn "Grubbs G: %.5f" grubbs.Statistic
printfn " Crit.: %.5f" (grubbs.GetUpperCriticalValue())
printfn " Reject: %A" (grubbs.Reject())
The value of the test statistic is 2.46876, which is
greater than the critical value of 2.03165. We therefore
reject the null hypothesis and conclude that the largest value is an outlier
at the 0.05 significance level.
Grubbs' test can only test for a single outlier. When multiple
outliers may be present, the Generalized Extreme Studentized Deviate
(ESD) test is appropriate. It consists of a sequence
of tests similar to Grubbs' test for a specific number of outliers
from 1 to a supplied maximum.
Like Grubbs' test, the generalized ESD test has one and two-sided
versions. In the one-sided version, the null hypothesis is that
the smallest (one-tailed lower) or largest (one-tailed upper) values
are not outliers. The alternative hypothesis is that there are up to
the specified number of outliers.
In the two-sided version, the null hypothesis is that there are
no outliers at either end of the sample. The alternative
hypothesis is that there are up to
the specified number of outliers
The generalized ESD test is implemented by the
GeneralizedEsdTest
class. It has 3 constructors.
The first constructor has no arguments. The sample and other test properties
must be set manually. A two-tailed test with a significance level of 0.05
is assumed by default. The number of outliers to test for is the smaller of
10 and half the number of samples.
The second constructor takes two arguments. The first is a vector that contains
the sample that is to be tested for outliers. The second argument is
the number of outliers to test for. This must be at least 1.
The third constructor takes a third argument that specifies whether the test
is one or two-tailed.
The example below tests a sample of 54 observations for up to 10 outliers.
var sample = Vector.Create(
-0.25, 0.68, 0.94, 1.15, 1.20, 1.26, 1.26, 1.34,
1.38, 1.43, 1.49, 1.49, 1.55, 1.56, 1.58, 1.65,
1.69, 1.70, 1.76, 1.77, 1.81, 1.91, 1.94, 1.96,
1.99, 2.06, 2.09, 2.10, 2.14, 2.15, 2.23, 2.24,
2.26, 2.35, 2.37, 2.40, 2.47, 2.54, 2.62, 2.64,
2.90, 2.92, 2.92, 2.93, 3.21, 3.26, 3.30, 3.59,
3.68, 4.30, 4.64, 5.34, 5.42, 6.01);
var test = new GeneralizedEsdTest(sample, 10, HypothesisType.TwoTailed);
Dim sample = Vector.Create(
-0.25, 0.68, 0.94, 1.15, 1.2, 1.26, 1.26, 1.34,
1.38, 1.43, 1.49, 1.49, 1.55, 1.56, 1.58, 1.65,
1.69, 1.7, 1.76, 1.77, 1.81, 1.91, 1.94, 1.96,
1.99, 2.06, 2.09, 2.1, 2.14, 2.15, 2.23, 2.24,
2.26, 2.35, 2.37, 2.4, 2.47, 2.54, 2.62, 2.64,
2.9, 2.92, 2.92, 2.93, 3.21, 3.26, 3.3, 3.59,
3.68, 4.3, 4.64, 5.34, 5.42, 6.01)
Dim gESD = New GeneralizedEsdTest(sample, 10, HypothesisType.TwoTailed)
No code example is currently available or this language may not be supported.
let sample = Vector.Create(
[|
-0.25; 0.68; 0.94; 1.15; 1.20; 1.26; 1.26; 1.34;
1.38; 1.43; 1.49; 1.49; 1.55; 1.56; 1.58; 1.65;
1.69; 1.70; 1.76; 1.77; 1.81; 1.91; 1.94; 1.96;
1.99; 2.06; 2.09; 2.10; 2.14; 2.15; 2.23; 2.24;
2.26; 2.35; 2.37; 2.40; 2.47; 2.54; 2.62; 2.64;
2.90; 2.92; 2.92; 2.93; 3.21; 3.26; 3.30; 3.59;
3.68; 4.30; 4.64; 5.34; 5.42; 6.01
|])
let test = new GeneralizedEsdTest(sample, 10, HypothesisType.TwoTailed)
Because the generalized ESD test is, in fact, a collection of tests,
the interpretation of the results is slightly different.
Running the test consists of running individual tests for a specific number
of outliers, from 1 to the specified maximum. If a test is found to be
significant, that number is retained, and the results of that test
are used as the overall test results. The
NumberOfOutliers
property returns the number of outliers that was found.
Its value ranges from 0 to the maximum.
The
GetOutlierIndexes
method returns a vector containing the indexes in the sample of any outliers
that were found. The method optionally takes a significance level (for example 0.05)
to use for the detection. Below we print the number of outliers and the values
of the outliers from the example above:
Console.WriteLine("Number of outliers: {0}", test.NumberOfOutliers);
var outliers = sample[test.GetOutlierIndexes()];
Console.WriteLine("Outliers: {0}", outliers);
Console.WriteLine("Number of outliers: {0}", gESD.NumberOfOutliers)
Dim theOutliers = sample(gESD.GetOutlierIndexes())
Console.WriteLine("Outliers: {0}", theOutliers)
No code example is currently available or this language may not be supported.
printfn "Number of outliers: %d" test.NumberOfOutliers
let outliers = sample.[test.GetOutlierIndexes()]
printfn "Outliers: %O" outliers
Individual tests can be accessed through the
GetTest(Int32)
method, which takes as its only argument the number of outliers.
Since we detected 3 outliers in this sample, we might want to look at the test
for 4 outliers:
var test4 = test.GetTest(4);
Console.WriteLine(test4.Summarize());
Dim test4 = gESD.GetTest(4)
Console.WriteLine(test4.Summarize())
No code example is currently available or this language may not be supported.
let test4 = test.GetTest(4)
printfn "%s" (test4.Summarize())
This shows that for 4 outliers, the value of the test statistic (2.8102)
is less than the critical value (3.1362).