# SAS Assignment2

Question 1: Consider a data set (PS4_hprice.txt) of house price (Y = “sale price”) with four independent variables:

X1 = “lot size”

X2 = “Number of bedrooms” X3 = “Number of bathrooms” X4 = “Number of storeys”

1. Regress the house price on all of the above explanatory variables. Then, test the normality assumption using the externally Studentized residuals at a significance level of 0.03.
2. Use Box-Cox transformation to transform the house price, where 𝜆is chosen from -2 to 2 with step size of 0.05. Find the estimate of 𝜆.
3. Use the Box-Cox transformed house price to answer the following questions.
1. Fit the model of the transformed house pricing on all of the explanatory variables. Then, construct 95% C.I. for the mean predicted value of house pricing and for the individual predicted value of house pricing at a newly added sample point x0 = (X1=4999, X2 =4, X3=1, X4=3)
2. Check the normality assumption using the externally Studentized residuals at a

significance level of 0.03.

3. Find the best 3-term model —– the model only involving 3 explanatory variables —– by R2, Adjusted R2, AIC and BIC.
4. Write down the best fitted regression line with 3 explanatory variables

Question 2: Consider a data set (PS4_hprice.txt) of house price (Y = “sale price”) with two independent variables

and two categorical variables

X1 = “lot size”

X2 = “Number of bedrooms”

D2 = “recreation room” (1 if Yes, and 0 if No)

D5 = “air conditioning” (1 if Yes, and 0 if No)

1. Check the significance of all interaction terms at a significance level of 0.03.
2. Write down your fitted models for the houses
1. With recreation room and air conditioning

2. Without recreation room and with air conditioning

3. With recreation room and without air conditioning

4. Without recreation room and air conditioning

Question 3: In a project to study age and growth characteristics of selected mussel species from Southwestern Virginia, data taken from locations below can be found in PS4_q3.txt.

1. Fit a regression with an interaction term(s) of age and location using weight as the response, age as the independent variable, and location as a categorical variable. Use dummy variable(s) to represent a categorical variable. Write down the fitted regression

line and specify how to define dummy variable(s). Estimate 𝜎2 and find 𝑅2.

2. Re-write the regression line in part a as the separate regression lines. That is, a regression

line for each location.

3. Test whether four slopes are equal at a significance level of 0.05. Specify the null hypothesis, test statistic, p-value, and your conclusion clearly.
4. Estimate the mean weights of locations 1 and 2 at overall mean age, and then construct

the 93% confidence interval of their difference.

5. Estimate the difference between the average of mean weights of locations 1 and 3 and that of locations 3 and 4 at age = 10, and then estimate its standard error. Test whether the difference is greater than -2.8. Specify the null hypothesis, test statistic, p-value, and your conclusion clearly.