一、線性迴歸模型診斷 (Linear Regression Model Diagnostics)
線性迴歸基於一些前提假設而得,前提假設可簡化數學公式,但相對地也承擔風險。線性迴歸模型診斷即提供迴歸模型是否過度簡化及前提假設是否需修正之檢測。
高斯-馬爾可夫定理的條件是(複習可以看這邊):
- 隨機項的(條件)期望值為零:E(εi)=0,
- 隨機項的變異數皆相同 (homoscedasticity):var(εi)=σ2<∞,
- 隨機項無自我相關 (no autocorrelation) :cov(εi,εj)=0,i≠j
[注意] 關於期望值為零這個條件
The mean of the residuals will always be zero provided that there is a constant term in the regression. Without a constant term,
- R2 (ESS/TSS) could be negative.
- biased slope coefficient estimate.
因此下面只以異質變異和自我相關分章節做討論。
二、異質變異誤差項檢定 (Detection of Heteroscedasticity)
檢定是否存在異質變異的方法有下列三種 :
- White 檢定 (最常使用) : 假設誤差項的變異數和自變數或自變數的二次式組合有關。
- Breusch-Pagan(BP)/Godfrey 檢定 (最具一般性) : 假設誤差項的變異數可能與其他變數相關。
- Goldfeld-Quandt 檢定 (較麻煩,且僅適用於橫斷面資料) : 假設誤差項的變異數和某一個變數有關。
1. Breusch-Pagan (BP) /Godfrey 異質性檢定流程
- 利用 OLS 估計迴歸式 y=β0+β1x1+β2x2+...+βkxk+u 並求得 OLS 殘差項的平方 ˆu2。
- 估計迴歸式 ˆu2=δ0+δ1x1+δ2x2+...+δkxk+error,並求出 R2ˆu2
- 求出 F 或 LM 統計量及對應的 p 值 (前者用 Fk,n−k−1 分配,後者用 χ2k 分配)。若 p 值夠小,亦即,其小於所選定的顯著水準,則拒絕同質性的虛無假設。
- F 統計量可以寫成:F=R2ˆu2/k(1−R2ˆu2)/(n−k−1)
- LM 統計量可以寫成:LM=n×R2ˆu2
2. White 異質性檢定流程
- 用 OLS 估計模型y=β0+β1x1+β2x2+...+βkxk+u。求得 OLS 殘差和配適值。計算殘差平方 ˆu2 和配適值平方 ˆy2
- 做 ˆu2=δ0+δ1ˆy 的迴歸。求出迴歸的 R 平方值,R2ˆu2
- 求出 F 或 LM 統計量,並計算 p 值 (前者用 Fk,n−k−1 分配,後者用 χ2k 分配)。
[用心去感覺] 三個獨立變數的white檢定估計式
太多獨立變數是純粹 White 檢定的一個弱點:它在有限個獨立變數的模型中用了太多自由度。
可透過 ˆu2=δ0+δ1ˆy 估計來檢定異質性,因為展開即是下面的式子。
ˆu2=δ0+δ1x1+δ2x2+δ3x3+δ4x21+δ5x22+δ6x23+δ7x1x2+δ8x1x3+δ9x2x3+error
3. Goldfeld–Quandt (GQ) 異質性檢定流程
- Split the total sample of length T into two sub-samples of length T1 and T2.
- The null hypothesis : H0:σ21=σ22
- GQ test statistic : GQ = \frac{s_1^2}{s_2^2}
[用心去感覺] GQ 檢定的缺點
the choice of where to split the sample is that usually arbitrary and may crucially affect the outcome of the test.
二、自相關檢定 (Detection of Autocorrelation)
1. Durbin-Watson 自相關檢定
僅可檢定誤差項是否存在一階自我相關;但若迴歸模型的解釋數含有應變數的落後項 (如 Yt−1 ),則無法使用。建議:DW 統計量只當參考,正式檢定還是透過底下兩個方法。
If et is the residual associated with the observation at time t, then the test statistic is
d=∑Tt=2(et−et−1)2∑Tt=1e2t, where T is the number of observations.
Since d is approximately equal to 2(1−r), where r is the sample autocorrelation of the residuals, d=2 indicates no autocorrelation.
DW has 2 critical values, an upper critical value (du) and a lower critical value (dL), and there is also an intermediate region where we can neither reject nor not reject H0.
2. Breush-Godfrey LM 檢定
Breush-Godfrey LM 檢定(Serial Correlation LM Test;序列相關 LM 檢定):運用殘差來檢定是否具有落後 p 期內的自我相關
Consider a linear regression of any form, for example
Yt=α0+α1Xt,1+α2Xt,2+ut
where the residuals might follow an AR(p) autoregressive scheme, as follows:
ut=ρ1ut−1+ρ2ut−2+⋯+ρput−p+εt.
The simple regression model is first fitted by ordinary least squares to obtain a set of sample residuals ˆut.
Breusch and Godfrey proved that, if the following auxiliary regression model is fitted
ˆut=α0+α1Xt,1+α2Xt,2+ρ1ˆut−1+ρ2ˆut−2+⋯+ρpˆut−p+εt
and if the usual R2 statistic is calculated for this model, then the following asymptotic approximation can be used for the distribution of the test statistic
nR2∼χ2p,
when the null hypothesis H0:{ρi=0 for all i} holds (that is, there is no serial correlation of any order up to p). Here n is the number of data-points available for the second regression, that for ˆut,
n=T−p,
where T is the number of observations in the basic series. Note that the value of n depends on the number of lags of the error term (p).
References
wiki - Goldfeld–Quandt test
https://en.wikipedia.org/wiki/Goldfeld%E2%80%93Quandt_test