We practice the diagnostic tools and data transformation using Canadian Survey of Labour and Income Dynamics (SLID). We want to replicate the analysis in Chapter 12 in JF.
Let’s first load the data.
slid.data <- read.table("SLID-Ontario.txt", header = TRUE)
# show dimension
dim(slid.data)
## [1] 3997 4
\[ Wage = \beta_0 + \beta_1 \cdot Sex + \beta_2 \cdot Age + \beta_3 \cdot Education + \epsilon \]
lmod <- lm(compositeHourlyWages ~ sex + age + yearsEducation, data = slid.data)
summary(lmod)
##
## Call:
## lm(formula = compositeHourlyWages ~ sex + age + yearsEducation,
## data = slid.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26.490 -4.302 -0.746 3.233 35.871
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.124231 0.598977 -13.56 <2e-16 ***
## sexMale 3.473670 0.207009 16.78 <2e-16 ***
## age 0.261293 0.008664 30.16 <2e-16 ***
## yearsEducation 0.929649 0.034257 27.14 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.542 on 3993 degrees of freedom
## Multiple R-squared: 0.3074, Adjusted R-squared: 0.3069
## F-statistic: 590.7 on 3 and 3993 DF, p-value: < 2.2e-16
library(car)
qqPlot(lmod, distribution = , line = "none", envelop = list(style = "lines")) # fill in the distribution type
\[ \log_2(Wage) = \beta_0' + \beta_1' \cdot Sex + \beta_2' \cdot Age + \beta_3' \cdot Education + \epsilon' \]
lmod.2 <- lm()
summary(lmod.2)
qqPlot(lmod.2, distribution = , line = "none",
envelop = list(style = "lines", col = 1, alpha = 0.1))
residualPlot(lmod, ) # look at the documentation of this function and finish it
residualPlot(lmod.2, )
in the SLID regression of log wages on these variables and sex.
# plot all together
crPlots(lmod.2, col = "grey")
# plot one at a time
# I didn't figure out how to change line colors...
crPlot(lmod.2, variable = "age", col = "grey",
smooth = list(smoother=loessLine, span = 0.4, col = "black"))
crPlot(lmod.2, variable = "yearsEducation", col = "grey",
smooth = list(smoother=loessLine, span = 0.4, col = "black"))
Comment on your findings from the last two plots