Probability & Statistics for Finance
Distributions, hypothesis testing, and regression analysis.
Probability and Statistics for Finance
Quantitative finance is built on a foundation of probability and statistics. From modeling asset returns to estimating risk measures, statistical tools are indispensable for anyone building financial systems or analyzing market data.
Probability Fundamentals
Random Variables and Distributions
A random variable maps outcomes of a random process to numerical values. In finance, returns, prices, and trading volumes are all random variables.
Expected value (mean):
Variance measures dispersion around the mean:
Standard deviation is .
Key Distributions in Finance
Normal (Gaussian) Distribution:
The normal distribution is central to finance due to the Central Limit Theorem. Log-returns are often assumed normally distributed.
Log-Normal Distribution: If is normal, then is log-normal. Stock prices are often modeled as log-normal since they cannot be negative.
Student's t-Distribution: Has heavier tails than normal, better capturing the fat tails observed in financial returns:
where is degrees of freedom.
Statistical Moments and Financial Metrics
Skewness
Measures asymmetry of the distribution:
- Negative skew: Long left tail (crash risk)
- Positive skew: Long right tail (upside potential)
Kurtosis
Measures tail heaviness:
Excess kurtosis = Kurtosis - 3 (since normal distribution has kurtosis of 3).
Financial returns typically exhibit leptokurtosis (excess kurtosis > 0), meaning more extreme events than normal distribution predicts.
Covariance and Correlation
Covariance measures how two variables move together:
Correlation normalizes covariance to [-1, 1]:
The correlation matrix for multiple assets is crucial for portfolio construction:
Hypothesis Testing
Framework
- State null hypothesis and alternative
- Choose significance level (typically 0.05 or 0.01)
- Calculate test statistic
- Compare to critical value or compute p-value
- Reject or fail to reject
Common Tests in Finance
t-test for means:
F-test for comparing variances or in regression analysis.
Jarque-Bera test for normality:
where is skewness and is kurtosis.
Regression Analysis
Ordinary Least Squares (OLS)
Linear regression models the relationship between variables:
OLS minimizes the sum of squared residuals:
In matrix form:
Evaluation Metrics
R-squared (): Proportion of variance explained:
Adjusted R-squared: Penalizes additional predictors.
Standard errors: Measure precision of coefficient estimates.
Financial Applications
- CAPM regression:
- Factor models: Multiple factors explaining returns
- Pairs trading: Cointegration analysis
Time Series Analysis
Financial data is inherently sequential. Key concepts:
Autocorrelation: Correlation of a series with its lagged values:
Stationarity: Statistical properties (mean, variance) don't change over time. Most financial time series are non-stationary in levels but stationary in returns.
Autoregressive (AR) Model:
GARCH Models: Capture volatility clustering:
Programming Implementation
Key statistical computations in finance software:
- Efficient calculation of rolling statistics
- Matrix operations for portfolio optimization
- Numerical methods for maximum likelihood estimation
- Bootstrap methods for confidence intervals
- Monte Carlo simulation for complex distributions
ELI10 Explanation
Simple analogy for better understanding
Self-Examination
Why do financial returns typically exhibit fat tails (leptokurtosis)? What are the implications for risk management?
Explain the difference between correlation and covariance. Why is correlation preferred when comparing relationships between different assets?
What assumptions underlie OLS regression? How might violations of these assumptions affect financial models?
Describe the GARCH model and explain why it's useful for modeling financial volatility. What is volatility clustering?
How would you test whether a trading strategy generates statistically significant returns? What pitfalls should you be aware of (e.g., multiple testing, survivorship bias)?