from scipy.stats import norm
-0.42) norm.cdf(
0.3372427268482495
\[ \renewcommand{\P}{\mathbb{P}} \renewcommand{\E}{\mathbb{E}} \newcommand{\R}{\mathbb{R}} \newcommand{\var}{\mathrm{Var}} \newcommand{\cov}{\mathrm{cov}} \newcommand{\corr}{\mathrm{corr}} \newcommand{\dx}{\,\mathrm{d}x} \newcommand{\dy}{\,\mathrm{d}y} \newcommand{\eps}{\varepsilon} \]
We discussed with you discrete and continuous random variables. For a random variable \(X\), you know now how to calculate some of its characteristics: expected value \(\E(X)\) and variance \(\var(X)\). Now we consider how to characterise a pair of random variables.
Remark. In some cases, the joint CDF can be calculated manually from the description of the problem. However, in general, to calculate the joint CDF, we need to have additional information: joint probaility mass function in discrete case and joint probability density function in continuous case.
Example 6.1 If joint PMF (for the discrete case) or joint PDF (for the continuous case) are not given explicitly, the joint CDF can be usually calculated only in very special cases, e.g. when one of variable is defined in terms of another one. For example, consider \(X\sim U(0,1)\) and \(Y=X^2\), then \(F_{X,Y}(x,y)=0\) if \(x<0\) or \(y<0\), and for \(x\geq0, y\geq0\), we have \[ \begin{aligned} F_{X,Y}(x,y)& = \P(X\leq x, X^2\leq y)=\P(0\leq X\leq x, X^2\leq y) \\ &= \P(0\leq X\leq x, -\sqrt{y}\leq X\leq \sqrt{y})\\ &=\P(0\leq X\leq \min\{x,\sqrt{y}\})\\ & = \begin{cases} 1, & \text{if } \min\{x,y\}\geq 1,\\ \min\{x,\sqrt{y}\}, & \text{if } \min\{x,y\}< 1. \end{cases} \end{aligned} \] However, if we just have two random variables, e.g. \(X\sim U(0,1)\) and \(Y=\sim(0,1)\), then we can’t calculate \(F_{X,Y}\), unless we explicitly define the function \(f_{X,Y}\).
Example 6.2 Let \(X\sim U(-1,1)\) and \(Y=X^2\). Then \(XY=X^3\), and hence \[ \cov(X,Y)=\E(XY)-\E(X)\E(Y)=\E(X^3)-\E(X)\E(X^2). \] We know that \[ \E(X)= \frac{(-1)+1}{2}=0. \] Next, since \(f_X(x)=\frac12\) for \(x\in(-1,1)\) and \(f_X(x)=0\) otherwise, we have \[ \begin{aligned} \E(X^3)&=\int_{-\infty}^\infty x^3 f_X(x)\dx & = \frac12\int_{-1}^1 x^3dx=\frac12\biggl\lfloor \frac{x^4}{4}\biggr\rfloor_{-1}^1 = 0. \end{aligned} \] Therefore, \(\cov(X,Y)=0\) i.e. \(X\) and \(Y\) are uncorrelated. However, clearly, \(X\) and \(Y=X^2\) are not independent.
Remark. In other words, the bigger \(n\) you take, the smaller chances are for the event \(\{|\bar{X}_n-\mu|>\varepsilon\}\). Equivalently, one can state that \[ \lim_{n\to\infty} \P(|\bar{X}_n-\mu|\leq \varepsilon) = 1, \] i.e. the bigger \(n\) you take, the larger chances are for the event \(|\bar{X}_n-\mu|\leq \varepsilon\) that is equivalent to \(\mu-\varepsilon<\bar{X}_n<\mu+\varepsilon\). Thus, informally speaking, with \(n\) growing, there are good chances to find \(\bar{X}_n\) around \(\mu\). We can choose \(\varepsilon\) arbitrary small, i.e. we can require that \(\bar{X}_n\) must very close to \(\mu\), and the law of large numbers states that there is high probability (close to \(1\)) to achieve this if we take \(n\) alrge enough.
Example 6.3 Consider many rolls of a fair \(6\)-sides dice. Let \(X_j\) be the score of the \(j\)-th roll, and \(S_n=X_1+\ldots+X_n\) be the sum of the scores in the first \(n\) rolls. All \(X_j\) are i.i.d. r.v. with \[ \E(X)=1\cdot \frac16 + 2\cdot \frac16 + \ldots + 6\cdot \frac16=\frac{7\cdot 6}{2}\cdot \frac16 = \frac72. \] Therefore, by LLN, for any small \(\eps>0\), \[ \lim_{n\to\infty}\P\Biggl( \biggl\lvert \frac{S_n}{n} - \frac72\biggr\rvert \leq \eps \Biggr) = 1, \] or equivalently, \[ \lim_{n\to\infty}\P\biggl( \Bigl(\frac72-\eps\Bigr)n \leq S_n\leq \Bigl(\frac72+ \eps \Bigr)n \biggr) = 1. \]
As we could see, LLN states that, for i.i.d. r.v. \(X_n\sim X\), \(n\geq1\), \[ \overline{X}_n=\frac{X_1+\ldots+X_n}{n}\to \E(X) \] stochastically (in probability) as \(n\to\infty\). We have also shown that \(\E(\overline{X}_n)=\E(X)\) for each \(n\), i.e. we can reformulate LLN as follows: \[ \overline{X}_n - \E(\overline{X}_n)\to 0, \qquad n\to\infty. \]
Remark. The central limit theorem shows, in particular, that \(\overline{X}_n\) fluctuates around its expected value \(\E(\overline{X}_n)=\mu\) with the standard deviation \(\sigma(\overline{X}_n)=\frac{\sigma}{\sqrt{n}}\) which is significantly less than the standard deviation \(\sigma\) for each of \(X_n\) around their expected value \(\E(X_n)=\mu\). And this is tru regardless of the distribution of \(X_n\). Consider this in an example.
Example 6.4 The average teacher’s salary in New Jersey in 2023 is $63178. Suppose that the salaries are distributed normallly with standard deviation $7500. Hence, we have that \(X\sim \mathcal{N}(63178,7500^2)\).
Lets first find the probability that a randomly selected teacher makes less than $60000 per year. We have
\[ \begin{aligned} \P(X<60000)&=\P\biggl(\frac{X-63178}{7500}<\frac{60000-63178}{7500}\biggr)\\ &=\P(Z<-0.42)=\Phi(-0.42), \end{aligned} \] where \(Z=\frac{X-63178}{7500}\sim \mathcal{N}(0,1)\).
Using statistical tables (and the equality \(\Phi(-0.42)=1-\Phi(0.42)\)) or Python commands
from scipy.stats import norm
-0.42) norm.cdf(
0.3372427268482495
we conclude that \[ \P(X<60000)\approx 0.337, \] i.e. one out of three randomly picked teachers may have the salary less than $60000.
Consider now a sample of \(100\) teacher salaries. The sample mean (the average salary) is then \[ \overline{X}_{100}=\frac{X_1+\ldots+X_{100}}{100} \] where all \(X_j\sim X\) are i.i.d. r.v. We know that \[ \E(\overline{X}_{100}) = \E(X)=63178 \] and \[ \sigma(\overline{X}_{100})=\frac{\sigma(X)}{\sqrt{100}}=750. \] Therefore, the probability that the average salary of any sample of \(100\) teachers is less than $60000 per year is \[ \begin{aligned} \P(\overline{X}_{100}<60000)&= \P\biggl( \frac{\overline{X}_{100}-63178}{750} < \frac{60000-63178}{750} \biggr) \\ &= \P(\overline{Z}_{100}<-4.2)\approx\Phi(-4.2) \end{aligned} \] where the latter approximate equality is accroding to CLT. Since
-4.2) norm.cdf(
1.3345749015906314e-05
we have that \[ \P(\overline{X}_{100}<60000)\approx 0.0000133, \] i.e., informally speaking, this is very unlikely.