Regularization

Ridge and Lasso

Packages Needed

Install the glmnet package to run the code from the following slides.

What Will You Learn/Review

  • The basic idea behind regularization

  • The difference between the penalty terms for Lasso and Ridge regression models

  • How the target function for Lasso regularized regression models differs from the \(MSE\) function of an unregularized model

  • How to create a workflow for a Lasso regularized regression using the R tidymodels framework

  • How the target function for Ridge regularized regression model differs from the \(MSE\) function of an unregularized model

  • How to create a workflow for a Ridge regularized model using the R tidymodels framework

Loading the Libraries, Data, and Splitting in Training/Testing Data:

Code
library(tidymodels); library(rio); library(janitor)
DataHousing=
  import("https://ai.lange-analytics.com/data/HousingData.csv") |>
  clean_names("upper_camel") |>
  select(Price, Sqft=SqftLiving)

set.seed(777)
Split001=initial_split(DataHousing, prop=0.001, strata=Price, breaks=5)
DataTrain=training(Split001)
DataTest=testing(Split001)
print(DataTrain)
     Price Sqft
1   153503 1240
2   199500 1750
3   234950 1720
4   246000 2120
5   355000 1240
6   385000 2090
7   365000  910
8   349000 1690
9   474950 2030
10  450000 1540
11  465000 2020
12  445000 1630
13  568000 2110
14  660000 2470
15  530000 1260
16  600000 2090
17 1150000 3830
18  885000 4470
19  978000 2390
20  705000 2040

The Model

\[\begin{equation} \displaystyle\widehat{Price}_i=\beta_1 Sqft_i+\beta_2 Sqft_i^2+\beta_3 Sqft_i^3+\beta_4 Sqft_i^4+\beta_5 Sqft_i^5+\beta_6 \end{equation}\]

Unregularized Model Minimizes the MSE by Choosing the Optimal \(\beta s\)

\[ \displaystyle MSE=\frac{1}{20}\sum_{i=1}^{20} \left ( \widehat{Price}_i-Price_i\right)^2 \] with:

\[\begin{equation} \displaystyle\widehat{Price}_i=\beta_1 Sqft_i+\beta_2 Sqft_i^2+\beta_3 Sqft_i^3+\beta_4 Sqft_i^4+\beta_5 Sqft_i^5+\beta_6 \end{equation}\]

Running the Unregularized Model

Code
library(tidymodels)
ModelDesignBenchmark=linear_reg() |>
                     set_engine("lm") |> 
                     set_mode("regression")

RecipeHouses=recipe(Price~., data=DataTrain) |> 
             step_mutate(Sqft2=Sqft^2,Sqft3=Sqft^3,
                         Sqft4=Sqft^4,Sqft5=Sqft^5) |> 
             step_normalize(all_predictors())

WFModelBenchmark=workflow() |> 
                 add_model(ModelDesignBenchmark) |> 
                 add_recipe(RecipeHouses) |> 
                 fit(DataTrain)
tidy(WFModelBenchmark)
# A tibble: 6 × 5
  term           estimate  std.error statistic       p.value
  <chr>             <dbl>      <dbl>     <dbl>         <dbl>
1 (Intercept)     509945.     36463.    14.0   0.00000000128
2 Sqft           8853783.  10515448.     0.842 0.414        
3 Sqft2        -50947114.  54352075.    -0.937 0.364        
4 Sqft3        112589222. 111217647.     1.01  0.329        
5 Sqft4       -106894260. 101985738.    -1.05  0.312        
6 Sqft5         36592435.  34688741.     1.05  0.309        

Assessing Prediction Quality (Training Data)

Code
DataTrainWithPredBenchmark=augment(WFModelBenchmark, DataTrain)
metrics(DataTrainWithPredBenchmark, truth=Price, estimate=.pred)
# A tibble: 3 × 3
  .metric .estimator  .estimate
  <chr>   <chr>           <dbl>
1 rmse    standard   136432.   
2 rsq     standard        0.715
3 mae     standard   104047.   

Assessing Prediction Quality (Training Data)

Code
DataTestWithPredBenchmark=augment(WFModelBenchmark, DataTest)
metrics(DataTestWithPredBenchmark, truth=Price, estimate=.pred)
# A tibble: 3 × 3
  .metric .estimator     .estimate
  <chr>   <chr>              <dbl>
1 rmse    standard   99940240.    
2 rsq     standard          0.0215
3 mae     standard    1719470.    

Regularization

Ridge

\[\begin{eqnarray} T^{arget}&=&\frac{1}{20}\sum_{i=1}^{20} \left ( \widehat{Price}_i-Price_i\right)^2+\lambda P^{enalty} \\ \mbox{with:}&& \widehat{Price}_i=\beta_1 Sqft_i+\beta_2 Sqft_i^2+\beta_3 Sqft_i^3+\beta_4 Sqft_i^4+\beta_5 Sqft_i^5+\beta_6 \nonumber \\ \mbox{with:}&& P^{enalty}=\sum_{j=1}^{5} \beta_j^2 \nonumber \end{eqnarray}\]

Two Goals: Minimize \(MSE\) and Minimize Penalty (small or zero \(\beta s\))

\(T^{arget}\) value still only depends on data.

Note, when reducing a large and a small parameter by the same amount, the impact of reducing the large parameter has a bigger impact on the penalty than reducing the small parameter. This is the reason that Ridge has a tendency to reduce large rather than small parameters.

Running the Ridge Model

Code
library(glmnet)
set.seed(777)
ModelDesignRidge=linear_reg(penalty=1000000, mixture=0) |>
                 set_engine("glmnet") |> 
                 set_mode("regression")

WFModelRidge=workflow() |> 
             add_model(ModelDesignRidge) |> 
             add_recipe(RecipeHouses) |> 
             fit(DataTrain)
tidy(WFModelRidge)
# A tibble: 6 × 3
  term        estimate penalty
  <chr>          <dbl>   <dbl>
1 (Intercept)  509945. 1000000
2 Sqft          25790. 1000000
3 Sqft2         23133. 1000000
4 Sqft3         19885. 1000000
5 Sqft4         16968. 1000000
6 Sqft5         14570. 1000000

Assessing Prediction Quality Ridge Model (Training Data)

Code
DataTrainWithPredBenchmark=augment(WFModelRidge, DataTrain)
metrics(DataTrainWithPredBenchmark, truth=Price, estimate=.pred)
# A tibble: 3 × 3
  .metric .estimator  .estimate
  <chr>   <chr>           <dbl>
1 rmse    standard   201534.   
2 rsq     standard        0.479
3 mae     standard   152902.   

Assessing Prediction Quality Ridge Model (Testingg Data)

Code
DataTestWithPredBenchmark=augment(WFModelRidge, DataTest)
metrics(DataTestWithPredBenchmark, truth=Price, estimate=.pred)
# A tibble: 3 × 3
  .metric .estimator  .estimate
  <chr>   <chr>           <dbl>
1 rmse    standard   330485.   
2 rsq     standard        0.237
3 mae     standard   186431.   

Regularization

Lasso

\[\begin{eqnarray} T^{arget}&=&\frac{1}{20}\sum_{i=1}^{20} \left ( \widehat{Price}_i-Price_i\right)^2+\lambda P^{enalty} \\ \mbox{with:}&& \widehat{Price}_i=\beta_1 Sqft_i+\beta_2 Sqft_i^2+\beta_3 Sqft_i^3+\beta_4 Sqft_i^4+\beta_5 Sqft_i^5+\beta_6 \nonumber \\ \mbox{with:}&& P^{enalty}=\sum_{j=1}^{5} | \beta_j | \nonumber \end{eqnarray}\]

Two Goals: Minimize \(MSE\) and Minimize Penalty (small or zero \(\beta s\))

\(T^{arget}\) value still only depends on data.

Note, reducing a large or a small \(\beta\) parameter by the same amount has the same impact on the penalty.

Running the Lasso Model

Code
library(glmnet)
set.seed(777)
ModelDesignLasso=linear_reg(penalty=500, mixture=1) |>
                 set_engine("glmnet") |> 
                 set_mode("regression")

WFModelLasso=workflow() |> 
             add_model(ModelDesignLasso) |> 
             add_recipe(RecipeHouses) |> 
             fit(DataTrain)
tidy(WFModelLasso)
# A tibble: 6 × 3
  term        estimate penalty
  <chr>          <dbl>   <dbl>
1 (Intercept)  509945.     500
2 Sqft        -460508.     500
3 Sqft2       1171967.     500
4 Sqft3             0      500
5 Sqft4             0      500
6 Sqft5       -560318.     500

Assessing Prediction Quality Lasso Model (Training Data)

Code
DataTrainWithPredBenchmark=augment(WFModelLasso, DataTrain)
metrics(DataTrainWithPredBenchmark, truth=Price, estimate=.pred)
# A tibble: 3 × 3
  .metric .estimator  .estimate
  <chr>   <chr>           <dbl>
1 rmse    standard   144976.   
2 rsq     standard        0.679
3 mae     standard   110007.   

Assessing Prediction Quality Lasso Model (Testing Data)

Code
DataTestWithPredBenchmark=augment(WFModelLasso, DataTest)
metrics(DataTestWithPredBenchmark, truth=Price, estimate=.pred)
# A tibble: 3 × 3
  .metric .estimator    .estimate
  <chr>   <chr>             <dbl>
1 rmse    standard   4723086.    
2 rsq     standard         0.0296
3 mae     standard    303118.