Lab 3

1 Discrete probability distributions in Python

Module scipy.stats of the famous library scipy provides various tools to work with probability and statistics in Python. Today, we consider several discrete distributions; to work with them will use the same structure of commands. In particular, any distribution will have the following methods:

rvs - to generate (several) values of a random variable \(X\) (see below in this Lab)
pmf - to calculate probability mass function at any \(k\in\mathbb{R}\) \[ p_X(k)=\mathbb{P}(X=k) \]
cdf - to calculate the cumulative distribuion function \[ F_X(x)=\sum_{k\leq x} p_X(k)=\mathbb{P}(X\leq x) \] recall that \[ \mathbb{P}(a<X\leq b) = F_X(b)-F_X(a) \]
mean - to calculate the mathematical expectation (mean) \(\mathbb{E}(X)\)
var - to calculate the varaince \(\mathrm{Var}(X)\)
std - to calculate the standard deviation \(\sigma(X)=\sqrt{\mathrm{Var}(X)}\)

2 Binomial distribution

First, one has to import binom class.

from scipy.stats import binom

For \(X\sim Bin(n, p)\), one can use binom.pmf(k, n, p) to calculate, \[ \mathbb{P}(X=k) = \binom{n}{k}p^k(1-p)^{n-k}, \] and one can use binom.cdf(k, n, p) to calculate \[ \mathbb{P}(X\leq k) = \sum_{j\leq k}\mathbb{P}(X=j). \]

Remark. Note that \(X\sim Bin(n, p)\) can take only values \(0, 1, 2, \ldots, n\). In particular, for example, for \(X\sim Bin(5, 0.1)\),

[binom.pmf(6, 5, 0.1), binom.pmf(-1, 5, 0.1), binom.pmf(2.5, 5, 0.1)]

[0.0, 0.0, 0.0]

since neither of 6, -1, 2.5 belongs to \(\{0, 1, 2, 3, 4, 5\}\).

However, cdf is the cumulative probability, e.g. \[ \mathbb{P}(X\leq 1.5)=\mathbb{P}(X=0)+\mathbb{P}(X=1), \] and hence, it’s not \(0\), namely,

binom.cdf(1.5, 5, 0.1)

0.9185399999999999

Remark. Note also that, e.g. \[ \mathbb{P}(X\leq 1)=\mathbb{P}(X\leq 1.5)=\mathbb{P}(X<2) \] since in all three cases it will be \(X=0\) or \(X=1\).

We know from lectures that \[ \mathbb{E}(X) = np, \qquad \mathrm{Var}(X) = np(1-p). \] We may also get these numbers from binom.mean(n, p) and binom.var(n, p). We can also get \[ \sigma(X)=\sqrt{\mathrm{Var}(X)}=\sqrt{np(1-p)} \] from binom.std(n, p).

2.1

Let \(X\sim Bin(27, 0.45)\).

Calculate \(\mathbb{P}(X\leq 20)\). Check the answer:

0.9994575733214396

Calculate \(\mathbb{P}(X < 10)\). Check the answer:

0.15256903022954094

Calculate \(\mathbb{P}(7<X\leq 13)\). Check the answer:

0.666671672666459

Calculate \(\mathbb{P}(7\leq X\leq 13)\). Check the answer:

0.6879613472859423

2.2

A company manufactures light bulbs, and \(95\)% of them are of good quality, while the rest are defective. If a customer buys \(50\) light bulbs, what is the probability that:

Exactly \(5\) of them are defective? Check the answer:

0.06584063715436628

At most \(5\) of them are defective? Check the answer:

0.9622238270102227

At least \(5\) of them are defective? Check the answer:

0.10361681014414348

Find the expected value of the number of defective bulbs and the standard deviation of this quantity. Check the answer:

[2.5, 1.541103500742244]

3 Geometric distribution

Recall that, for \(X\sim Geom(p)\), \[ \mathbb{P}(X = k) = (1-p)^{k-1} p, \] and this the probability that the first “success” happened in the \(k\)-th Bernoulli trial.

To work with the geometric distribution in Python, one has to import geom class.

from scipy.stats import geom

The functions are similar to the binomial case: geom.pmf(k, p), geom.cdf(k, p), geom.mean(p), geom.var(p), geom.std(p).

3.1

Let \(X\sim Geom(0.4)\). Calculate \(\mathbb{P}(7\leq X <10)\). Check the answer.

0.03657830400000006

3.2

A lazy student has to take a quiz, where each question may have as an answer an integer number from \(1\) to \(100\) (different questions may have equal answers). Instead of preparation, the student is going to guess the answers. The quiz contains \(20\) questions. As soon as the student gives a correct answer, the quiz stops, and it is considered as a passed one. What is the probability that the student will pass the quiz?

Hint: the quiz may stop after either of \(1,2,3,\ldots,20\) questions.

Check the answer:

0.18209306240276918

4 Negative Binomial Distribution

Recall that \(X\sim NB(r, p)\) describes the number \(k\) of failures in a series of trials before the \(r\)-th success occurs for the first time. The total number of trials is then, hence, \(n=r+k\). We know that \[ \mathbb{P}(X = k) = \binom{k+r-1}{k} p^r (1-p)^{k}. \]

To work with the negative binomial distribution in Python, one has to import nbinom class.

from scipy.stats import nbinom

The previously discussed functions can be used similarly: nbinom.pmf(k, r, p), nbinom.cdf(k, r, p), nbinom.mean(r, p), nbinom.var(r, p), nbinom.std(r, p).

4.1

Let \(X\sim NB(5, 0.3)\). Find \(\mathbb{P}(X>10)\). Check the answer.

0.5154910592268431

4.2

A lazy student has to take a quiz, where each question may have as an answer an integer number from \(1\) to \(100\) (different questions may have equal answers). Instead of preparation, the student is going to guess the answers. The quiz contains \(20\) questions. The quiz stops as soon as the student answers correctly \(3\) questions. What is the probability to pass the test for this student?

Hint: think on how many wrong answers could be made by the student to still pass the test.

Check the answer.

0.0010035761681001162

5 Poisson Distribution

Recall that \(X\sim Po(\lambda)\) models the number of independent events occurring in a fixed interval of time or space. It is characterized by \(\lambda>0\) that the average rate of events per interval of the same size.

We know that \[ P(X = k) = \frac{\lambda^k}{k!}e^{-\lambda}. \]

To work with the Poisson distribution in Python, one has to import poisson class.

from scipy.stats import poisson

and the functions are poisson.pmf(k, lambda), poisson.cdf(k, lambda), poisson.mean(lambda), poisson.var(lambda), poisson.std(lambda).

5.0.0.1

Let \(X\sim Po(0.4)\). Find \[ \sum_{n=4}^\infty \mathbb{P}(X=n). \] Check the answer.

0.0007762513762070711

5.1

In an insurance company, customers’ claims are raised at an average rate of \(5\) claims per working day. Calculate the probability that

Exactly \(30\) claims will be raised in one working week (Monday – Friday). Check the answer.

0.04541278513011904

At least \(8\) claims will be raised in the next \(2\) working days. Check the answer.

0.7797793533983011

6 Generated data

We can also generate data, in particular, random data. For example, we can use for this scipy library. Recall that each of the random variable classes in this library has method rvs which can be used to generate (several) random values distributed according to the corresponding probability distribution. The general form of the command is distribution_name.rvs(distribution_parameters, size = number_of_genereted_values), where distribution_name is one of e.g. binom, geom, nbinom, poisson, and instead of number_of_genereted_values you write a number, whereas you keep size = as it is (see below). The distribution_parameters will be all parameters needed for the given distribution, i.e. to generate \(5\) binomial random variables you would use binom.rvs(n, p, size = 5) with certain n and p (and similarly for the negative binomial), but for \(5\) geometrical random variables you would use geom.rvs(p, size = 5) with certain p only since there is only one parameter for the geometrical distribution.

For example, in Task 2.2 above we considered a company which manufactures light bulbs, and \(95\)% of them are of good quality, while the rest are defective. If a customer buys \(50\) light bulbs, the random number \(X\) of defective bulbs there has a binomial distribution, namely, \(X\sim Bin(50,0.05)\), as \(p=0.05\) is the probability of a defective bulb. To generate e.g. \(7\) random values of \(X\), we use the commands

from scipy.stats import binom
binom.rvs(50, 0.05, size = 7)

array([4, 2, 0, 2, 1, 3, 4], dtype=int64)

If you run this command several times, you will probably get each time a different result, as the numbers are random. To fix an output, we should set a random_state:

from scipy.stats import binom # If you kept the previous cell, there is no need in this line
c = binom.rvs(50, 0.05, size = 7, random_state = 123)
c

array([3, 2, 1, 3, 3, 2, 6], dtype=int64)

Now you may run this command again and again, with the same output. If you change 123 by another natural number, you will get another output (which will remain fixed unless you change the value of random_state again).

6.1

Let \(X\sim Po(3)\) be a Poisson random variable with the parameter \(\lambda=3\). Generate \(100\) random values of \(X\) fixing random_state = 111, and assign the resulting Numpy array to a variable f. Calculate the mean and the (population) variance of f. Check your answer. Don’t forget to load numpy first.

[2.9, 2.87]

We know that the theoretical mean (expected value) \(\mathbb{E}(X)\) and variance \(\mathrm{Var}(X)\) for a Poisson random variable are equal to \(\lambda\):

\[ \mathbb{E}(X)=\mathrm{Var}(X)=\lambda. \]

Clearly, \(2.9\neq 3\neq 2.87\). To make the statistics more “matching” the probability, we need to increase the size of the data: let’s generate \(10^6\) random variables (keeping random_state = 111). Check the answers.

[2.998107, 2.9899274165509997]