Lab 5

1 Uniform Distribution

Continuous probability distributions in Python can be treated similarly to the discrete distributions. For example, for the uniform distribution we start with

from scipy.stats import uniform

We know that a uniform random variables \(X\) depends on two parameters \(a\) and \(b\), namely, \(X\sim U(a,b)\) is distributed uniformly on a segment \([a,b]\). The length of this segment is \(b-a\). In Python, the uniform distribution also depends on two parameters: loc = a and scale = b - a. The default values (which are used if these parameters are omitted) are loc = 0 and scale = 1 that corresponds to \(U(0,1)\).

Probability Density Function

The PDF \(f_X(x)\) is then 1/scale on \([a,b]\) and \(0\) otherwise. For example, for \(X\sim(1,6)\), we can calculate

uniform.pdf(3, loc = 1, scale = 5)

0.2

that is \(\frac15\), whereas,

uniform.pdf(7, loc = 1, scale = 5)

0.0

as \(7\notin[1,6]\).

We can also plot the graph of the PDF for \(X\sim U(1,6)\), e.g. we plot it for \(x\in[-2,9]\):

import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(-2, 9, 1000)
plt.plot(x, uniform.pdf(x, loc = 1, scale = 5))
plt.show()

Not that the jumps at \(x=1\) and \(x=6\) are shown by vertical segments.

Cumulative Distribution Function

Similarly, CDF \(F_X(x)\) of \(X\sim U(a,b)\) can be calculated by using the command uniform.cdf(x, loc = a, scale = b - a).

1.1

Plot the graph of \(F_X(x)\) for \(X\sim U(1,6)\) on the interval \([-2,9]\). Check the output:

Generating Random Values

\(n\) values of the random variable \(X\sim U(a,b)\) can be generated using the command uniform.rvs(size = n, loc = a, scale = b - a). As we discussed on Lab 3, if we want to fix the output, we use the key random_state, for example, the following output

uniform.rvs(size = 3, loc = 1, scale = 5, random_state = 1)

array([3.08511002, 4.60162247, 1.00057187])

will be the same each time you run the code, whereas if you omit random_state = 1 the result will be different every time you run the code (try!)

1.2

Assign to variable x the array of \(10^6\) random values uniformly distributed on \([0,1]\), fix the random state equal to 123. Calculate the mean of \(x\). Check the output.

0.49993343872814583

As you can see the output is pretty close to \(0.5 = \frac{0+1}{2}=\mathbb{E}(X)\).

2 Exponential distribution

Recall that, for \(X\sim \mathrm{Exp}(\lambda)\), the PDF is \[ f_X(x) = \begin{cases} \lambda e^{-\lambda x}, & \text{if } x \geq 0 \\ 0, & \text{otherwise}, \end{cases} \] and the corresponding CDF is \[ F_X(x) = \begin{cases} 1- e^{-\lambda x}, & \text{if } x \geq 0 \\ 0, & \text{otherwise}. \end{cases} \] We know also that \[ \mathbb{E}(X) = \frac1\lambda, \quad \mathrm{Var}(X) = \mathbb{E}(X^2)- \bigl(\mathbb{E}(X)\bigr)^2= \frac1{\lambda^2}. \]

In Python, we first load the corresponding class

from scipy.stats import expon

Next, we use the same commands, e.g. expon.pdf, expon.cdf, expon.rvs, however, the key scale in this command is inverse to \(\lambda\): scale = 1/lambda.

For example, to calculate \(f_X(1)\) for \(X\sim \mathrm{Exp}(0.2)\), we use the command

expon.pdf(1, scale = 5)

0.16374615061559636

since \(1/0.2=5\). (Surely, you can also write scale = 1/0.2 in you code.)

2.1

Generate \(10^6\) values of the random variable \(X\sim \mathrm{Exp}(0.2)\), using the random_state key equal to 12. Calculate the variance of the generated values, using the formula \[ \mathrm{Var}(X) = \mathbb{E}(X^2)- \bigl(\mathbb{E}(X)\bigr)^2 \] Check the answer.

25.04882818201701

You can also use np.var command. Check the output in this case:

25.048828182017044

The results are almost identical (there is always some numerical error). Moreover, the result is close to the theoretical value \(\frac1{0.2^2}=25\).

Percentiles

Another important function available for all random variables in scipy.stats module is ppf, which provides percentiles. By the definition, for a random variable \(X\) and for any \(q\in[0,1]\), the \(q\)-percentile of \(X\) is the number \(a\) such that \[ F_X(a) = \mathbb{P}(X\leq a) = q. \] In other words, the percentile is the inverse function to CDF.

For example, for \(X\sim \mathrm{Exp}(0.2)\),

a = expon.ppf(0.3, scale = 1/0.2)
a

1.7833747196936622

is \(0.3\)-percentile of \(X\), and then we can see that

expon.cdf(a, scale = 1/0.2)

0.30000000000000004

is effectively the initial \(0.3\).

2.2

Let \(X\sim \mathrm{Exp}(0.7)\). Find \(b\) such that \[ \mathbb{P}(1 \leq X \leq b) = 0.4. \] Hint: use first the formula \[ \mathbb{P}(a \leq X \leq b) = F_X(b) - F_X(a). \] Check the answer:

3.3390409771454475

3 Normal Distribution

Recall that \(X\sim\mathcal{N}(\mu,\sigma^2)\) has the PDF given by \[ f_X(x) = \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x-\mu)^2}{2\sigma^2}}. \] We also know that \[ \begin{aligned} \mathbb{E}(X)&=\mu,\\ \mathrm{Var}(X)&=\sigma^2. \end{aligned} \]

We start with

from scipy.stats import norm

The PDF can be used in the code with the command norm.pdf(x, loc, scale) where loc stands for \(\mu\) and scale stands for \(\sigma\). Similarly, one can use norm.cdf(x, loc, scale), norm.ppf(q, loc, scale), and norm.rvs(loc, scale, size) as above. The default values are loc = 0 and scale = 1 that corresponds to the standard normal distribution \(Z\sim\mathcal{N}(0,1)\).

For example, the following code plots the graph of \(f_X(x)\) for \(X\sim \mathcal{N}(2,3^2)\) (on the interval \(x\in(-10,14)\) - for example). The graph is in red colour and the axes are properly labelled.

# Assuming that you previously loaded numpy and matplotlib
x = np.linspace(-10, 14, 1000)
y = norm.pdf(x, loc = 2, scale = 3)
plt.plot(x, y, color = 'red')
plt.xlabel('x')
plt.ylabel('y=f(x)')
plt.show()

3.1

Plot the graph of CDF for \(X\sim \mathcal{N}(2,3^2)\) (also on the interval \(x\in(-10,14)\)). Use green colour and label the axis with \(x\) and \(y=F(x)\).

3.2

Calculate the probability that a randomly selected individual has a height between \(160\) cm and \(170\) cm, given that the population mean height is \(165.5\) cm and the standard deviation is \(10.2\) cm, and that the heights follow the normal distribution.

Sub-task 1: calculate the answer using functions for general normal random variable \(X\sim \mathcal{N}(\mu,\sigma^2)\). Check the answer:

0.37558835807069463

Sub-task 2: recalculate the answer using functions for standard normal random variable \(Z\sim \mathcal{N}(0,1)\). Check the answer:

0.37558835807069463

3.3

Let \(X\sim\mathcal{N}(12,5^2)\). Find \(c\) such that \[ \mathbb{P}(c \leq X \leq 15) = 0.5. \] Check the answer.

8.235364852055193

3.4

Generate \(10^6\) values of standard normal random variable. Check their standard deviation (use np.std function) and ensure that the result is close to \(1\).

0.9988975242430829

(you may get a different answer, as we didn’t fix random_state here).

If you have time

As you could see, we often need to calculate different functions for the same values of the parameters, e.g. in the example we considered earlier:

a = expon.ppf(0.3, scale = 1/0.2)
expon.cdf(a, scale = 1/0.2)

0.30000000000000004

In this case, we can also assign to a variable the distribution with needed values of parameters, and then use that variable with all methods of the distribution. E.g. the example above can be also rewritten

rv = expon(scale = 1/0.2)
a = rv.ppf(0.3)
rv.cdf(a)

0.30000000000000004