Probability and Statistics for Finance

Quantitative finance is built on a foundation of probability and statistics. From modeling asset returns to estimating risk measures, statistical tools are indispensable for anyone building financial systems or analyzing market data.

Probability Fundamentals

Random Variables and Distributions

A random variable maps outcomes of a random process to numerical values. In finance, returns, prices, and trading volumes are all random variables.

Expected value (mean): $E[X] = \mu = \sum_{i} x_i P(x_i) \quad \text{(discrete)}$ $E[X] = \mu = \int_{-\infty}^{\infty} x f(x) dx \quad \text{(continuous)}$

Variance measures dispersion around the mean: $Var(X) = \sigma^2 = E[(X - \mu)^2] = E[X^2] - (E[X])^2$

Standard deviation is $\sigma = \sqrt{Var(X)}$ .

Key Distributions in Finance

Normal (Gaussian) Distribution: $f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$

The normal distribution is central to finance due to the Central Limit Theorem. Log-returns are often assumed normally distributed.

Log-Normal Distribution: If $\ln(X)$ is normal, then $X$ is log-normal. Stock prices are often modeled as log-normal since they cannot be negative.

Student's t-Distribution: Has heavier tails than normal, better capturing the fat tails observed in financial returns: $f(x) = \frac{\Gamma(\frac{\nu+1}{2})}{\sqrt{\nu\pi}\Gamma(\frac{\nu}{2})}\left(1 + \frac{x^2}{\nu}\right)^{-\frac{\nu+1}{2}}$

where $\nu$ is degrees of freedom.

Statistical Moments and Financial Metrics

Skewness

Measures asymmetry of the distribution: $\text{Skewness} = E\left[\left(\frac{X - \mu}{\sigma}\right)^3\right]$

Negative skew: Long left tail (crash risk)
Positive skew: Long right tail (upside potential)

Kurtosis

Measures tail heaviness: $\text{Kurtosis} = E\left[\left(\frac{X - \mu}{\sigma}\right)^4\right]$

Excess kurtosis = Kurtosis - 3 (since normal distribution has kurtosis of 3).

Financial returns typically exhibit leptokurtosis (excess kurtosis > 0), meaning more extreme events than normal distribution predicts.

Covariance and Correlation

Covariance measures how two variables move together: $Cov(X, Y) = E[(X - \mu_X)(Y - \mu_Y)]$

Correlation normalizes covariance to [-1, 1]: $\rho_{X,Y} = \frac{Cov(X, Y)}{\sigma_X \sigma_Y}$

The correlation matrix for multiple assets is crucial for portfolio construction: $\Sigma = \begin{pmatrix} 1 & \rho_{12} & \cdots \\ \rho_{21} & 1 & \cdots \\ \vdots & \vdots & \ddots \end{pmatrix}$

Hypothesis Testing

Framework

State null hypothesis $H_0$ and alternative $H_1$
Choose significance level $\alpha$ (typically 0.05 or 0.01)
Calculate test statistic
Compare to critical value or compute p-value
Reject or fail to reject $H_0$

Common Tests in Finance

t-test for means: $t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}$

F-test for comparing variances or in regression analysis.

Jarque-Bera test for normality: $JB = \frac{n}{6}\left(S^2 + \frac{(K-3)^2}{4}\right)$

where $S$ is skewness and $K$ is kurtosis.

Regression Analysis

Ordinary Least Squares (OLS)

Linear regression models the relationship between variables: $Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \epsilon$

OLS minimizes the sum of squared residuals: $\min_{\beta} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$

In matrix form: $\hat{\beta} = (X^TX)^{-1}X^Ty$

Evaluation Metrics

R-squared ( $R^2$ ): Proportion of variance explained: $R^2 = 1 - \frac{SS_{res}}{SS_{tot}}$

Adjusted R-squared: Penalizes additional predictors.

Standard errors: Measure precision of coefficient estimates.

Financial Applications

CAPM regression: $R_i - R_f = \alpha + \beta(R_m - R_f) + \epsilon$
Factor models: Multiple factors explaining returns
Pairs trading: Cointegration analysis

Time Series Analysis

Financial data is inherently sequential. Key concepts:

Autocorrelation: Correlation of a series with its lagged values: $\rho_k = \frac{E[(X_t - \mu)(X_{t-k} - \mu)]}{\sigma^2}$

Stationarity: Statistical properties (mean, variance) don't change over time. Most financial time series are non-stationary in levels but stationary in returns.

Autoregressive (AR) Model: $X_t = c + \phi_1 X_{t-1} + \phi_2 X_{t-2} + \cdots + \epsilon_t$

GARCH Models: Capture volatility clustering: $\sigma_t^2 = \omega + \alpha \epsilon_{t-1}^2 + \beta \sigma_{t-1}^2$

Programming Implementation

Key statistical computations in finance software:

Efficient calculation of rolling statistics
Matrix operations for portfolio optimization
Numerical methods for maximum likelihood estimation
Bootstrap methods for confidence intervals
Monte Carlo simulation for complex distributions

Probability & Statistics for Finance