Week 1
Clip 1
Part I – Classical linear regression model
After these clips on the classical linear regression model, you should
Understand what the classical linear regression model (CLRM) is and how it can be
used in empirical finance
Know key concepts: estimation, inference, estimator, estimate, parameters, dummy
variables, outliers
Understand what assumptions are needed for valid inference in the CLRM and why
Understand the t and F test, and model adequacy measures such as the R2 and 𝑅̅
The following video clips are involved with Pack 1:
Clip I: Motivation/Notation/Transforming variables Stata Clip I
Clip II: Dummies (Level/Slope dummies, notation, perfect multi-collinearity) Stata
Clip II
Clip III: Model Adequacy and outliers Stata Clip III
Clip IV: Parameter estimation and inference I (two parts)
Clip V: Inference II (t and F test) Stata Clip IV
Motivation: explaining income
Suppose you would like to explain wage
Which factors could possible affect wage?
o Education? Experience?
o Other factors? include 𝜖 (or 𝜖 for measurement error)
How do these factors affect wage? (linear – going up from 1 to 2 years is the same as
going from 20 to 21 years, non-linear – going up from 1 to 2 years is more
meaningful than going from 20 to 21 years)
Most simple relationship:
o 𝑊𝑎𝑔𝑒𝑖 = 𝛽0 + 𝛽1𝐸𝑑𝑢𝑐𝑖 + 𝛽2𝐸𝑥𝑝𝑖 + 𝜖𝑖
What is random?
What is random and non-random in the linear regression model?
Our example: 𝑊𝑎𝑔𝑒𝑖 = 𝛽0 + 𝛽1𝐸𝑑𝑢𝑐𝑖 + 𝛽2𝐸𝑥𝑝𝑖 + 𝜖𝑖
o 𝜖𝑖 = unobserved and random: error term
o 𝑊𝑎𝑔𝑒𝑖 = observed, random: dependent variable
o 𝛽0, 𝛽1, 𝛽2 = fixed non-random, but not known: parameters
o 𝐸𝑑𝑢𝑐1, 𝐸𝑥𝑝𝑖 = observed, possibly random/non-random, regressors
Therefore, anything that depends on data will be (a) random (variable) and will have
distributional properties (as we took a sample)
Vector notation (Sec. 4.1 – 4.2)
Consider again our example 𝑊𝑎𝑔𝑒𝑖 = 𝛽0 + 𝛽1𝐸𝑑𝑢𝑐𝑖 + 𝛽2𝐸𝑥𝑝𝑖 + 𝜖𝑖
Gather parameters into a parameter vector:
𝛽0
𝛽 = ( 𝛽1)
𝛽2
Vectors in my class are always column vectors, unless transposed, e.g.
𝛽′ = ( 𝛽0 𝛽1 𝛽2 )
Gather the firm characteristics into a vector
𝑥𝑖 = ( 𝐸𝑑𝑢𝑐𝑖)
𝐸𝑥𝑝𝑖
𝛽0
Note: 𝑥′
𝛽
𝑖 𝛽 = (1 𝐸𝑑𝑢𝑐𝑖 𝐸𝑥𝑝𝑖) ( 1) = 𝛽0 + 𝛽1𝐸𝑑𝑢𝑐𝑖 + 𝛽2𝐸𝑥𝑝𝑖
𝛽2
Therefore, 𝑊𝑎𝑔𝑒
′
𝑖 = 𝑥𝑖 𝛽 + 𝜖𝑖
Defining 𝑦𝑖 = 𝑊𝑎𝑔𝑒𝑖, we obtain our first standard notation for a linear regression model
𝑦
′
𝑖 = 𝑥𝑖 𝛽 + 𝜖𝑖 with 𝑦𝑖 ∈ ℝ1, 𝛽 ∈ ℝ𝑘𝑥1, 𝑥𝑖 ∈ ℝ𝑘𝑥1, 𝜖𝑖 ∈ ℝ1 with in our example k = 3
regressors (also the constant 𝛽0) (check formats matrix multiplication) Further notation: we stack all observation i=1, …, 3010
𝑦1
𝐸𝑑𝑢𝑐1
𝐸𝑥𝑝1
𝜖1
𝑦
𝐸𝑑𝑢𝑐
𝐸𝑥𝑝
𝜖
𝑦 = (
) , 𝑋 = (
) , 𝜖 = (
),
𝑦3010
1 𝐸𝑑𝑢𝑐
𝜖
3010
𝐸𝑥𝑝3010
3010
We obtain our second standard notation for the linear regression model, the matrix
notation: 𝑦 = 𝑋𝛽 + 𝜖
(never use this for thesis, you want to show what x and y are)
Thus, X = 3010 x 3, and 𝛽 is 3x1, so the product X𝛽 is defined and is 3010x1, just like y and 𝜖 Check the following:
𝑦 = 𝑋𝛽 + 𝜖
𝑦1
𝐸𝑑𝑢𝑐1
𝐸𝑥𝑝1
𝜖
𝛽
𝑦
𝐸𝑑𝑢𝑐
𝐸𝑥𝑝
𝜖
) = 𝑋 = (
) ( 𝛽 ) + (
𝑦
𝛽
3010
𝐸𝑑𝑢𝑐
𝜖
3010
𝐸𝑥𝑝3010
3010
𝑊𝑎𝑔𝑒1
= 𝛽0 + 𝐸𝑑𝑢𝑐1𝛽1
𝐸𝑥𝑝1𝛽2
+𝜖1
𝑊𝑎𝑔𝑒
𝛽
𝛽
+𝜖
= (
= 𝛽0 + 𝐸𝑑𝑢𝑐2. 1
𝐸𝑥𝑝2 2
𝑊𝑎𝑔𝑒
+𝜖
3010
= 𝛽0 + 𝐸𝑑𝑢𝑐3010𝛽1 𝐸𝑥𝑝3010𝛽2
3010
As an aside: the constant term is a column of ones in the X matrix
So, it is all the same, but written differently. Why? In some case, some notations are more
convenient than others! The most sophisticated is 𝑊𝑎𝑔𝑒𝑖 = 𝛽0 + 𝛽1𝐸𝑑𝑢𝑐𝑖 + 𝛽2𝐸𝑥𝑝𝑖 + 𝜖𝑖
When can we use linear regression?
If the model is linear in the parameters
(𝛽1 𝑒𝑡𝑐. )
It might be non-linear in the variables
(𝐸𝑑𝑢𝑐𝑖 𝑒𝑡𝑐. )
Examples:
o Linear in parameters, linear in variables:
linear regression
𝑊𝑎𝑔𝑒𝑖 = 𝛽0 + 𝛽1𝐸𝑑𝑢𝑐𝑖 + 𝜖𝑖
o Linear in parameters, non-linear in variables:
linear regression
𝑊𝑎𝑔𝑒
𝑖 = 𝛽0 + 𝛽1𝐸𝑑𝑢𝑐𝑖 + 𝛽2𝐸𝑥𝑝𝑖 + 𝜖𝑖
o Non-linear in parameters, linear in variables:
non-linear
𝑊𝑎𝑔𝑒
𝑖 = 𝛽0 + 𝛽1𝐸𝑑𝑢𝑐𝑖 + 𝛽2 𝐸𝑥𝑝𝑖 + 𝜖𝑖
o Non-linear in parameters, non-linear in variables:
non-linear
𝑊𝑎𝑔𝑒
−1
𝑖 = 𝛽0 + 𝛽1𝛽2 (𝐸𝑑𝑢𝑐𝑖 − 1)𝛽2 + 𝜖𝑖
Non-linear in parameters can’t use the linear regression model.
𝑊𝑎𝑔𝑒
𝑖 = 𝛽0 + 𝛽1𝐸𝑑𝑢𝑐𝑖 + 𝜖𝑖
if Education goes up by one year, my wage goes up by 𝛽1,
given the amount of Experience stays the same. Transforming variables
Sometimes, you can transform a non-linear model to make it linear: this changes the
interpretation of the coefficients.
Example:
o Transformation of a model: Cobb-Douglas production function
𝑌
𝛽1 𝛽2
𝑖 = 𝛽0𝐾
𝐿 , take logs to obtain the linear (in the parameters) specification
𝑖
𝑖
𝑦
𝑖 = 𝛽𝑜 + 𝛽1𝑘𝑖 + 𝛽2𝑙𝑖, with 𝛽𝑜 a constant (equal to ln𝛽0), and 𝑘𝑖 = 𝑙𝑛𝐾𝑖 etc.
𝜕𝑙𝑛𝑦
The coefficient 𝛽
𝑖
2 is now an elasticity
(relative change %) rather than
𝜕𝑙𝑛𝑥𝑖
𝜕𝑦
𝜕𝑊𝑎𝑔𝑒
the usual marginal effect 𝑖 (=
𝑖
, 𝑐𝑒𝑡𝑒𝑟𝑖𝑠 𝑝𝑎𝑟𝑖𝑏𝑢𝑠!)
𝜕𝑥𝑖
𝜕𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖
Transforming variables (logs)
1. Taking a logarithm can often help to rescale the data so that their variance is more
constant, which overcomes a common statistical problem known as
heteroskedasticity (variance will be much more constant, week 3)
2. Logarithmic transfers can help to make a positively (right) skewed distribution closer
to a normal distribution (such as firm size)
3. Taking logarithms can also be a way to make a non-linear, multiplicative relationship
between variables into a linear, additive one (see example above)
Transforming variables – when to take the log?
It depends on the economic settings: variables may need to be transformed before
being put into a regression (make plots to see!)
Some variables are non-intuitive when untransformed, e.g. FX rates (all exchange
rates have their own scale), it is necessary to transform them. (e.g. Increase of 1
Japanese Yen is completely different than increase in Dollar against a Euro)
In finance: pay attention on the application (e.g. modelling log-returns, but portfolios
𝑝
𝑝
need simple returns! (ln ( 𝑡 ) versus 𝑡−𝑝𝑡−1 ))
𝑝𝑡−1
𝑝𝑡−1