Statistics by @cosmog97

A13 – My stochastic distribution generator

Create a distribution representation (histogram, or CDF …) to represent the following:
– Realizations taken from a Normal(0,1)
– Realizations of the mean, obtained by averaging several times (say m times, m large) n of the above realizations
– Realizations of the variance, obtained by averaging several times (say m times, m large) n of the above realizations
– Realizations taken from exp(N(0,1)))
– Realizations taken from N(0,1) squared
– Realizations taken from a (squared N(0,1)) divided by another (squared N(0,1))

Highlights of my solution

A short video to show the behavior of the application:

Follow this project on

DOWNLOAD | C#

In collaboration with Laura Vagnetti & Giuseppe Di Naso

RA9 – An overview between theory and simulation

Try to find on the web what are the names of the random variables that you just simulated in the applications, and see if the means and variances that you obtain in the simulation are compatible with the “theory”. If not fix the possible bugs.

Realizations taken from a Normal(0,1):

The standard normal distribution is a special case of the normal distribution. For the standard normal distribution, the value of the mean is equal to zero (μ = 0), and the value of the standard deviation is equal to 1 (σ = 1). Thus, by plugin μ = 0 and σ = 1 in the PDF of the normal distribution, the equation simplifies to

The random variable that possesses the standard normal distribution is denoted by z. Consequently, units for the standard normal distribution curve are denoted by z and are called the z-values or z-scores. They are also called standard units or standard scores.

The cumulative distribution function (CDF) of the standard normal distribution, corresponding to the area under the curve for the interval (−∞,z], usually denoted with the capital Greek letter ϕ, is given by

where e ≈ 2.71828 and π ≈ 3.14159. [1]

Realizations of the mean, obtained by averaging several times (say m times, m large) n of the above realizations:

The mean of the sampling distribution of the mean is the mean of the population from which the scores were sampled. Therefore, if a population has a mean μ, then the mean of the sampling distribution of the mean is also μ. The symbol μM is used to refer to the mean of the sampling distribution of the mean. Therefore, the formula for the mean of the sampling distribution of the mean can be written as μM = μ. The variance of the sampling distribution of the mean is computed as follows:

That is, the variance of the sampling distribution of the mean is the population variance divided by N, the sample size (the number of scores used to compute a mean). Thus, the larger the sample size, the smaller the variance of the sampling distribution of the mean. We remember in fact the Central Limit Theorem states that: [2]

“Given a population with a finite mean μ and a finite non-zero variance σ2, the sampling distribution of the mean approaches a normal distribution with a mean of μ and a variance of σ2/N as N, the sample size, increases.”

Realizations of the variance, obtained by averaging several times (say m times, m large) n of the above realizations:

To better explain the result, let’s take an example. From the above table showing all samples of size 2, taken from the population 1,2,3,4,5, and 6, you can construct the sampling distribution of the sample variance. Simply square each of the standard deviations and pair the standard deviations with their probabilities as shown in the next table.

The expected value of this sampling distribution is (0)(6/36) + (0.504)(10/36)+(1.99)(8/36)+(4.49)(6/36)+(8.01)(4/36)+(12.53)(2/36)=2.92. This is the variance of the population. The variance of this sampling distribution can be computed by finding the expected value of the square of the sample variance and subtracting the square of 2.92, which is 11.65.

The probability distribution for the sample variances shows no negative values on the horizontal axis. This is always true for variances because variances can’t be negative. Secondly, the graph does not have the symmetric look of the graph of sample means. In fact, the graph of the sample variance distribution will always be skewed to the right.

From this sampling distribution of sample variances, the only conclusion that can be made is that the expected or mean value of sample variances is the population variance. In order to make further statements about the sampling distribution of sample variances, the population from which samples are selected must have a normal distribution. In that case, it can be shown that the sampling distribution of sample variances has a special form called a chi-square distribution with one parameter, the parameter being the sample size minus one (n-1). This parameter is called the degrees of freedom of the chi-square distribution.

As described below, in general when samples of size n are taken from a normal distribution with variance, the sampling distribution of the has a chi-square distribution with n-1 degrees of freedom. [3]

Realizations taken from exp(N(0,1)):

The exponential distribution arises in connection with Poisson processes: it exponential distribution is often concerned with the amount of time until some specific event occurs. For example, the amount of time (beginning now) until an earthquake occurs has an exponential distribution. Other examples include the length, in minutes, of long-distance business telephone calls, and the amount of time, in months, a car battery lasts. It can be shown, too, that the value of the change that you have in your pocket or purse approximately follows an exponential distribution. [4]

$(x) = \begin{cases}\lambda e^{-\lambda x} & x \ge 0\\0 & x < 0\end{cases}$

Values for an exponential random variable occur in the following way. There are fewer large values and more small values. As λ is decreased in value, the distribution is stretched out to the right, and as λ is increased, the distribution is pushed toward the origin. This distribution has no shape parameter as it has only one shape, i.e. the exponential, and the only parameter it has is the failure rate, λ. The distribution starts at T = 0 at the level of F(T = 0) = λ and decreases thereafter exponentially and monotonically as T increases, and is convex. The PDF for an exponential distribution is given in the form below, where λ is the rate parameter and x is random variable: [5]

Realizations taken from N(0,1) squared:

Suppose that $Z$ is a random variable sampled from the standard normal distribution, where the mean is 0 and the variance is 1: $Z\sim N(0,1)$ . Now, consider the random variable $Q=Z^{2}$ . The distribution of the random variable $Q$ is an example of a so defined chi-squared distribution: $\ Q\ \sim \ \chi _{1}^{2}.$

The subscript 1 indicates that this particular chi-squared distribution is constructed from only 1 standard normal distribution. A chi-squared distribution constructed by squaring a single standard normal distribution is said to have 1 degree of freedom. Thus, as the sample size for a hypothesis test increases, the distribution of the test statistic approaches a normal distribution. Just as extreme values of the normal distribution have low probability (and give small p-values), extreme values of the chi-squared distribution have low probability.

More generally, the probability density function gives the distribution of the sum of the squares of a number of independent random variables each with a normal distribution with zero mean and unit variance, that has the property that the sum of two or more random variables with such a distribution also has one, and that is widely used in testing statistical hypotheses especially about the theoretical and observed values of a quantity and about population variances and standard deviations. [6] [7] [8]

Realizations taken from a (squared N(0,1)) divided by another (squared N(0,1)):

The ratio of two independent chi-squared variates has a beta-prime distribution (also sometimes called a “Beta distribution of the second kind”). If you divide each of the chi-square variates by its df the ratio has an F-distribution.

The Beta distribution is a continuous probability distribution often used to model the uncertainty about the probability of success of an experiment. It can be used to analyze probabilistic experiments that have only two possible outcomes, success, with probability X, and failure, with probability 1 – X. These experiments are famously called Bernoulli experiments. [9]

Also called Variance Ratio Distribution, the F-Distribution is a probability density function that is used especially in the analysis of variance and is a function of the ratio of two independent random variables each of which has a chi-square distribution and is divided by its number of degrees of freedom. The F-distribution got its name after the name of R.A. Fisher, who studied this test for the first time in 1924. [10][11]

REFERENCES

[1] geo.fu-berlin.de, “The Standard Normal Distribution“, URL.
[2] D. M. Lane, “Sampling Distribution of the Mean“, URL.
[3] csus.edu, “Sampling Distributions“, URL.
[4] P. Woolf, “Continuous Distributions- normal and exponential“, URL.
[5] unf.edu, “Distributions”, URL.
[6] merriam-webster.com, “chi-square distribution”, URL.
[7] jimp.com, “The Chi-Square Distribution”, URL.
[8] en.wikipedia.org, “Chi-Square Distribution“, URL.
[9] M. Taboga, “Beta Distribution”, URL.
[10] merriam-webster.com, “F-distribution”, URL.
[11] businessjargons.com, “F-Distribution”, URL.

A12 – A new important stochastic process

Discover one of the most important stochastic process by yourself ! Consider the general scheme we have used so far to simulate stochastic processes (such as the relative frequency of success in a sequence of trials, the sample mean, the random walk, the Poisson point process, etc.) and now add this new process to our simulator. Starting from value 0 at time 0, for each of m paths, at each new time compute P(t) = P(t-1) + Random step(t), for t = 1, …, n, where the Random step(t) is now: σ * sqrt(1/n) * Z(t), where Z(t) is a N(0,1) random variable (the “diffusion” σ is a user parameter, to scale the process dispersion). At time n (last time) and one (or more) other chosen inner time 1<j<n (j is a program parameter) create and represent with histogram the distribution of P(t). Observe the behavior of the process for large n.

Highlights of my solution

A short video to show the behavior of the application:

Follow this project on

DOWNLOAD | C#

In collaboration with Laura Vagnetti & Giuseppe Di Naso

R13 – The standard Wiener process as “scaling limit” of a random walk

An “analog” of the CLT for stochastic process: the standard Wiener process as “scaling limit” of a random walk and the functional CLT (Donsker theorem) or invariance principle. Explain the intuitive meaning of this result and how you have already illustrated the result in your homework

By clearly recalling the different idealized assumptions in this article, the underlying dynamics of the Brownian particle being knocked about by molecules suggests a random walk as a possible model, but with tiny time steps and tiny spatial jumps. Let X = (X₀, X₁, X₂, …) be the symmetric simple random walk. Thus, X_n = where U = (U₁, U₂, …) is a sequence of independent variables with P(U_i = 1) = P(U_i = −1) = 1/2 for each i ∈ N+. Recall that E(X_n) = 0 and var(X_n) = n for n ∈ N. Also, since X is the partial sum process associated with an IID sequence, X has stationary, independent increments (but of course in discrete time). Finally, recall that by the central limit theorem, converges to the standard normal distribution as n → ∞. Now, for h,d ∈ (0,∞) the continuous time process

is a jump process with jumps at {0, h, 2h, …} and with jumps of size ±d. Basically, we would like to let h↓0 and d↓0, but this cannot be done arbitrarily. Note that E[X_h,d(t)] = 0 but var[X_h,d(t)] = d²⌊t/h⌋. Thus, by the central limit theorem, if we take d = then the distribution of X_h,d(t) will converge to the normal distribution with mean 0 and variance t as h↓0. More generally, we might hope that all of the requirements in the definition are satisfied by the limiting process, and if so, we have a standard Brownian motion. [1][2]

REFERENCES

[1] stats.libretexts.org, “Standard Brownian Motion”, URL.
[2] en.wikipedia.org, “Brownian motion”, URL.

R12 – The Brownian motion and the Wiener process

What is the “Brownian motion” and what is a Wiener process. History, importance, definition and applications (Bachelier, Wiener, Einstein, …)

In 1827, the botanist Robert Brown noticed that tiny particles from pollen, when suspended in water, exhibited continuous but very jittery and erratic motion. In his miracle year in 1905, Albert Einstein explained the behavior physically, showing that the particles were constantly being bombarded by the molecules of the water, and thus helping to firmly establish the atomic theory of matter. Brownian motion as a mathematical random process was first constructed in a rigorous way by Norbert Wiener in a series of papers starting in 1918. For this reason, the Brownian motion process is also known as the Wiener process.

Along with the Bernoulli trials process and the Poisson process, the Brownian motion process is of central importance in probability. Each of these processes is based on a set of idealized assumptions that lead to a rich mathematical theory. In each case also, the process is used as a building block for a number of related random processes that are of great importance in a variety of applications. In particular, Brownian motion and related processes are used in applications ranging from physics to statistics to economics.

The French mathematician Louis Bachelier in his doctoral thesis (1900) on the “Théorie de la spéculation” developed a theory, based on a statistical approach, with the aim of accounting for the trend in the prices of securities on the Paris Stock Exchange. The mathematical tools used by him are very similar to those used by Einstein in the analysis of Brownian motion, and they share the fundamental assumptions: that the variations of the quantity in question (the prices of the securities in this case, the displacements in that of the of the particles) are independent from the previous ones, and that the probability distribution of these variations is Gaussian.

For this work, which represents the first mathematical representation of the trend over time of economic-financial phenomena, Bachelier is considered the father of mathematical finance: in his honor the Croatian-American mathematician William Feller proposed to indicate the Wiener process as a process by Bachelier – Wiener.

Definition

A standard Brownian motion is a random process X = {Xt : t ∈ [0,∞)} with state space R that satisfies the following properties:

X₀ = 0 (with probability 1).
X has stationary increments. That is, for s, t ∈ [0,∞) with s < t, the distribution of X_t−X_s is the same as the distribution of X_t−s.
X has independent increments. That is, for t₁, t₂, …, t_n ∈ [0,∞) with t₁ < t₂ < ⋯ < t_n, the random variables X_t₁, X_t₂−X_t1, …, X_{t_n}−X_{t_n−1} are independent.
X_t is normally distributed with mean 0 and variance t for each t ∈ (0,∞).
With probability 1, t ↦ X_t is continuous on [0,∞).

To understand the assumptions, let’s take them one at a time.

Suppose that we measure the position of a Brownian particle in one dimension, starting at an arbitrary time which we designate as t = 0, with the initial position designated as x = 0. Then this assumption is satisfied by convention. Indeed, occasionally, it’s convenient to relax this assumption and allow X₀ to have other values.

This is a statement of time homogeneity: the underlying dynamics (namely the jostling of the particle by the molecules of water) do not change over time, so the distribution of the displacement of the particle in a time interval [s,t] depends only on the length of the time interval. This is an idealized assumption that would hold approximately if the time intervals are large compared to the tiny times between collisions of the particle with the molecules.

This is another idealized assumption based on the central limit theorem (CLT): the position of the particle at time t is the result of a very large number of collisions, each making a very small contribution. The fact that the mean is 0 is a statement of spatial homogeneity: the particle is no more or less likely to be jostled to the right than to the left. Next, recall that the assumptions of stationary, independent increments means that var(X_t) = σ²t for some positive constant σ². By a change in time scale, we can assume σ²= 1. [1][2][3]

References

[1] stats.libretexts.org, “Standard Brownian Motion”, URL.
[2] en.wikipedia.org, “Brownian motion”, URL.
[3] Focus, “Che cosa è il moto browniano?”, URL

RA8 – A well known distribution: Poisson

Find out on the web what you have just generated in the previous application. Can you find out about all the well known distributions that “naturally arise” in this process ?

From the previous application, we can say that the random variable follows the Poisson distribution.

In statistics, a Poisson distribution is a probability distribution that is used to show how many times an event is likely to occur over a specified period, useful for characterizing events with very low probabilities of occurrence within some definite time or space. In other words, it is a count distribution. Poisson distributions are often used to understand independent events that occur at a constant rate within a given interval of time.

Where:

e is Euler’s number (e = 2.71828…)
X is the number of occurrences
X! is the factorial of X
λ is equal to the expected value (EV) of X when that is also equal to its variance

The French mathematician Siméon-Denis Poisson developed his function in 1830 to describe the number of times a gambler would win a rarely won game of chance in a large number of tries. Letting p represent the probability of a win on any given trial, the mean, or average, the number of wins (λ) in n tries will be given by λ = np. Using the Swiss mathematician Jakob Bernoulli’s binomial distribution, Poisson showed that the probability of obtaining k wins is approximately λ^k/e^−λk!, where e is the exponential function and k! = k(k − 1)(k − 2)⋯2∙1. Noteworthy is the fact that λ equals both the mean and variance (a measure of the dispersal of data away from the mean) for the Poisson distribution.

Many economic and financial data appear as count variables, such as how many times a person becomes unemployed in a given year, thus lending themselves to analysis with a Poisson distribution. [1][2][3]

REFERENCES

[1] Adam Hayes, “Poisson Distribution”, URL.
[2] probabilitycourse.com, “Basic Concepts of the Poisson Process”, URL.
[3] Encyclopedia Britannica, “Poisson distribution“, URL.

A10 – Mean, first and last order statistics

Given a random variable, extract m samples of size n and plot the empirical distribution of its mean (histogram), the first and the last order statistics. Comment on what you see.

Highlights of my solution

A short video to show the behavior of the application:

Follow this project on

DOWNLOAD | C#

In collaboration with Laura Vagnetti & Giuseppe Di Naso

A11 – A new stochastic process (Poisson distribution)

Discover a new important stochastic process by yourself! Consider the general scheme we have used so far to simulate some stochastic processes (such as the relative frequency of success in a sequence of trials, the sample mean and the random walk) and now add this new process to our process simulator.
Same scheme as previous program (random walk), except changing the way to compute the values of the paths at each time. Starting from value 0 at time 0, for each of m paths, at each new time compute N(i) = N(i-1) + Random step(i), for i = 1, …, n, where Random step(i) is now a Bernoulli random variable with success probability equal to λ * (1/n) (where λ is a user parameter, eg. 50, 100, …).
At time n (last time) and one (or more) other chosen inner time 1<j<n (j is a program parameter) create and represent with histogram the distribution of N(i). Represent also the distributions of the following quantities (and any other quantity that you think of interest):
– Distance (time elapsed) of individual jumps from the origin
– Distance (time elapsed) between consecutive jumps (the so-called “holding times”)

Highlights of my solution

A short video to show the behavior of the application:

Follow this project on

DOWNLOAD | C#

In collaboration with Laura Vagnetti & Giuseppe Di Naso

R11 – Correlation Coefficient and the most common indices

Do a research about the general correlation coefficient for ranks and the most common indices that can be derived by it. Do one example of computation of these correlation coefficients for ranks.

Correlation is a statistical method used to assess a possible linear association between two continuous variables. It is simple both to calculate and to interpret. However, misuse of correlation is so common among researchers that some statisticians have wished that the method had never been devised at all. [1]

A correlation coefficient is a number between -1 and 1 that tells you the strength and direction of a relationship between variables. In other words, it reflects how similar the measurements of two or more variables are across a dataset. Correlation coefficients are unit-free, which makes it possible to directly compare coefficients between studies.

The sign of the coefficient reflects whether the variables change in the same or opposite directions: a positive value means the variables change together in the same direction, while a negative value means they change together in opposite directions.

The absolute value of a number is equal to the number without its sign. The absolute value of a correlation coefficient tells you the magnitude of the correlation: the greater the absolute value, the stronger the correlation.

Before going on in reading, let’s remember that the correlation coefficient is a bivariate statistic when it summarizes the relationship between two variables, and it’s a multivariate statistic when you have more than two variables. It is also an effect size measure, which tells you the practical significance of a result.

Visualizing linear correlations

The correlation coefficient tells you how closely your data fit on a line. If you have a linear relationship, you’ll draw a straight line of best fit that takes all of your data points into account on a scatter plot. The closer your points are to this line, the higher the absolute value of the correlation coefficient and the stronger your linear correlation.

If all points are perfectly on this line, you have a perfect correlation:

If these points are spread far from this line, the absolute value of your correlation coefficient is low:

Note that the steepness or slope of the line isn’t related to the correlation coefficient value. The correlation coefficient doesn’t help you predict how much one variable will change based on a given change in the other because two datasets with the same correlation coefficient value can have lines with very different slopes.

Types of correlation coefficients

You can choose from many different correlation coefficients based on the linearity of the relationship, the level of measurement of your variables, and the distribution of your data.

Pearson’s r

The Pearson’s product-moment correlation coefficient, also known as Pearson’s r, describes the linear relationship between two quantitative variables.

These are the assumptions your data must meet if you want to use Pearson’s r:

Both variables are on an interval or ratio level of measurement
Data from both variables follow normal distributions
Your data have no outliers
Your data is from a random or representative sample
You expect a linear relationship between the two variables

Pearson’s r is a parametric test, so it has high power. But it’s not a good measure of correlation if your variables have a non-linear relationship, or if your data have outliers, skewed distributions, or come from categorical variables. If any of these assumptions are violated, you should consider a rank correlation measure.

The formula for Pearson’s r is complicated, but most computer programs can quickly churn out the correlation coefficient from your data. In a simpler form, the formula divides the covariance between the variables by the product of their standard deviations.

with:

r_xy= strength of the correlation between variables x and y
n = sample size
∑ = sum of what follows…
X = every x-variable value
Y = every y-variable value
XY = the product of each x-variable score and the corresponding y-variable score

When using the Pearson correlation coefficient formula, you’ll need to consider whether you’re dealing with data from a sample or the whole population. The sample and population formulas differ in their symbols and inputs. A sample correlation coefficient is called r, while a population correlation coefficient is called rho, the Greek letter ρ.

The sample correlation coefficient uses the sample covariance between variables and their sample standard deviations. The population correlation coefficient uses the population covariance between variables and their population standard deviations.

with:

ρ_XY,r_xy = strength of the correlation between variables x and y
cov(X,Y), cov(x,y) = covariance of x and y
s_x = sample standard deviation of x
s_y = sample standard deviation of y
σ_X = population standard deviation of X
σ_Y = population standard deviation of Y

Spearman’s rho

Spearman’s rho, or Spearman’s rank correlation coefficient, is the most common alternative to Pearson’s r. It’s a rank correlation coefficient because it uses the rankings of data from each variable (e.g., from lowest to highest) rather than the raw data itself.

You should use Spearman’s rho when your data fail to meet the assumptions of Pearson’s r. This happens when at least one of your variables is on an ordinal level of measurement or when the data from one or both variables do not follow normal distributions.

While the Pearson correlation coefficient measures the linearity of relationships, the Spearman correlation coefficient measures the monotonicity of relationships. In a linear relationship, each variable changes in one direction at the same rate throughout the data range. In a monotonic relationship, each variable also always changes in only one direction but not necessarily at the same rate.

Positive monotonic: when one variable increases, the other also increases.
Negative monotonic: when one variable increases, the other decreases.
Monotonic relationships are less restrictive than linear relationships.

The symbols for Spearman’s rho are ρ for the population coefficient and rs for the sample coefficient. The formula calculates the Pearson’s r correlation coefficient between the rankings of the variable data. To use this formula, you’ll first rank the data from each variable separately from low to high: every data point gets a rank from first, second, or third, etc. Then, you’ll find the differences (di) between the ranks of your variables for each data pair and take that as the main input for the formula.

with:

r_s= strength of the rank correlation between variables
d_i = the difference between the x-variable rank and the y-variable rank for each pair of data
∑d²_i = sum of the squared differences between x- and y-variable ranks
n = sample size

If you have a correlation coefficient of 1, all of the rankings for each variable match up for every data pair. If you have a correlation coefficient of -1, the rankings for one variable are the exact opposite of the ranking of the other variable. A correlation coefficient near zero means that there’s no monotonic relationship between the variable rankings.

Other coefficients

When you square the correlation coefficient, you end up with the correlation of determination (r²). This is the proportion of common variance between the variables. The coefficient of determination is always between 0 and 1, and it’s often expressed as a percentage.

A high r² means that a large amount of variability in one variable is determined by its relationship to the other variable. A low r² means that only a small portion of the variability of one variable is explained by its relationship to the other variable; relationships with other variables are more likely to account for the variance in the variable.

When you take away the coefficient of determination from unity (one), you’ll get the coefficient of alienation. This is the proportion of common variance not shared between the variables, the unexplained variance between the variables.

A high coefficient of alienation indicates that the two variables share very little variance in common. A low coefficient of alienation means that a large amount of variance is accounted for by the relationship between the variables. [2]

An example of calculating Spearman’s correlation

To calculate a Spearman rank-order correlation on data without any ties we will use the following data:

We then complete the following table:

Where d = difference between ranks and d² = difference squared. We then calculate the following:

We then substitute this into the main equation with the other information as follows:

as n = 10. Hence, we have a ρ (or r_s) of 0.67. This indicates a strong positive relationship between the ranks individuals obtained in the Maths and English exams. That is, the higher you ranked in Maths, the higher you ranked in English also, and vice versa. [3][4]

REFERENCES

[1] MM Mukaka, “A guide to appropriate use of Correlation coefficient in medical research”, URL.
[2] Pritha Bhandari, “A guide to correlation coefficients”, URL.
[3] Lund Research, “Spearman’s Rank-Order Correlation”, URL.
[4] questionpro.com, “Spearman correlation coefficient: Definition, Formula and Calculation with Example“, URL.

R10 – Distributions of the order statistics

Distributions of the order statistics: look on the web for the most simple (but still rigorous) and clear derivations of the distributions, explaining in your own words the methods used.

Order statistics are a very useful concept in statistical sciences. They have a wide range of applications including modeling auctions, car races, and insurance policies, optimizing production processes, estimating parameters of distributions, et al. Through this article, we’ll understand the idea of order statistics. We’ll first understand its meaning and gradually proceed to its distribution, eventually covering more advanced concepts.

Suppose we have a set of random variables X₁, X₂, …, X_n, which are independent and identically distributed (i.i.d). By independence, we mean that the value taken by a random variable is not influenced by the values taken by other random variables. By identical distribution, we mean that the probability density function (PDF) (or equivalently, the Cumulative distribution function, CDF) for the random variables is the same. The k^th order statistic for this set of random variables is defined as the k^th smallest value of the sample.

To better understand this concept, we’ll take 5 random variables X₁, X₂, X₃, X₄, X₅. We’ll observe a random realization/outcome from the distribution of each of these random variables. Suppose we get the following values:

The k^th order statistic for this experiment is the k^th smallest value from the set {4, 2, 7, 11, 5}. So, the 1^st order statistic is 2 (smallest value), the 2^nd order statistic is 4 (next smallest), and so on. The 5^th order statistic is the fifth smallest value (the largest value), which is 11. We repeat this process many times i.e., we draw samples from the distribution of each of these i.i.d random variables, & find the k^th smallest value for each set of observations. The probability distribution of these values gives the distribution of the k^th order statistics.

In general, if we arrange random variables X₁, X₂, …, X_n in ascending order, then the k^th order statistic is shown as:

The general notation of the k^th order statistic is X_(k). Note X_(k) is different from X_k. X_k is the k^th random variable from our set, whereas X_(k) is the k^th order statistic from our set. X_(k) takes the value of X_k if X_k is the k^th random variable when the realizations are arranged in ascending order.

The 1^st order statistic X₍₁₎ is the set of the minimum values from the realization of the set of ‘n’ random variables. The n^th order statistic X_(n) is the set of the maximum values (nth minimum values) from the realization of the set of ‘n’ random variables. They can be expressed as:

Distribution of Order Statistics

We’ll now try to find out the distribution of order statistics. We’ll first describe the distribution of the n^th order statistic, then the 1^st order statistic & finally the k^th order statistic in general.

A) Distribution of the n^th Order Statistic:

Let the probability density function (PDF) & cumulative distribution function (CDF) our random variables be f_x(x), and F_x(x) respectively. By definition of CDF,

Since our random variables are identically distributed, they have the same PDF f_x(x) & CDF F_x(x). We’ll now calculate the CDF of n^th order statistic (F_n(x)) as follows:

The random variables X₁, X₂, …, X_n are also independent. Therefore, by property of independence,

The PDF of the n^th order statistic (f_n(x)) is calculated as follows:

Thus, the expression for the PDF & CDF of n^th order statistic has been obtained.

B) Distribution of the 1^st Order Statistic:

The CDF of a random variable can also be calculated as the one minus the probability that the random variable X takes a value more than or equal to x. Mathematically,

We’ll determine the CDF of 1^st order statistic (F₁(x)) as follows:

We’ll determine the CDF of 1st order statistic (F1(x)) as follows

Once again, using the property of independence of random variables,

The PDF of the 1^st order statistic (f₁(x)) is calculated as follows:

Thus, the expression for PDF & CDF of 1^st order statistic has been obtained.

C) Distribution of the k^th Order Statistic:

For k^th order statistic, in general, the following equation describes its CDF (F_k(x)):

The PDF of k^th order statistic (f_k(x)) is expressed as:

To avoid confusion, we’ll use geometric proof to understand the equation. As discussed before, the set of random variables have the same PDF (f_X(x)). The following graph shows a sample PDF with the k^th order statistic obtained from random sampling:

So, the PDF of the random variables f_X(x) is defined between the interval [a,b]. The kth order statistic for a random sample is shown by the red line. The other variable realizations (for the random sample) are shown by the small black lines on the x-axis.

There are exactly (k – 1) random variable observations that fall in the yellow region of the graph (the region between a & k^th order statistic). The probability that a particular observation falls in this region is given by the CDF of the random variables (F_X(x)). But we are aware that (k – 1) observations did fall in the region, which gives us the term (by independence) (F_X(x))^{(k – 1)}.

There are exactly (n – k) random variable observations that fall in the blue region of the graph (the region between k^th order statistic & b). The probability that a particular observation falls in this region is given by the 1 – CDF of the random variables (1– F_X(x)). But we are aware that (n – k) observations did fall in the region, which gives us the term (by independence) (1–F_X(x))^{(n – k)}.

Finally, exactly 1 observation falls exactly at the kth order statistic with probability f_X(x). Thus, the product of the 3 terms gives us an idea of the geometric meaning of the equation for PDF of the kth order statistic. But where does the factorial term come from? The above scenario just showed one of the many orderings. There can be many such combinations. The total number of such combinations is shown as follows:

Thus, the product of all of these terms gives us the general distribution of the k^th order statistic.

Useful Functions of Order Statistics

Order statistics give rise to various useful functions. Among them, the notable ones include sample range and sample median.

1) Sample range: It is defined as the difference between the largest and smallest value. It is expressed as follows:

2) Sample median: The sample median divides the random sample (realizations from the set of random variables) into two halves, one that contains samples with lower values, and the other that contains the samples with higher values. It’s like the middle/central order statistic. It is mathematically defined as:

Joint PDF of Order Statistics

A joint probability density function can help us better understand the relationship between two random variables (two order statistics
in our case). The joint PDF for any 2 order statistics X_(a) & X_(b), such that 1 ≤ a ≤ b ≤ n is given by the following equation:

Example

We’ll use a very simple example to illustrate the distribution of order statistics- the standard uniform distribution (U[0, 1] distribution). We’ll take 5 random variables X₁, X₂, X₃, X₄, X₅, all having the U[0, 1] distribution. For this set of random variables, we’ll calculate & plot the 1^st, 3^rd (the sample median) & 5^th (n^th) order statistics. The following figure shows the U[0, 1] distribution:

We’ll draw random samples as follows and find the 1^st, 3^rd& 5^th order statistics for each sample. Two of the samples are shown below:

The PDF & CDF of standard uniform distribution is given as:

We’ll use this information and calculate X₍₁₎, X₍₃₎ & X₍₅₎ using the formulas we derived. We’ll take the case only when x is between 0 & 1 (for other cases, the order statistic is zero as PDF is zero). [1][2][3]

A) For 1^st order statistics:

f1(c)

B) For 3^rd order statistics:

C) For 5^th order statistics:

f5(x)

REFERENCES

[1] Naman Agrawal, “Introduction to Order Statistics“, URL.
[2] colorado.edu, “Order Statistics”, URL.
[3] statisticshowto.com, “Order Statistics: Simple Definition, Examples”, URL.