Data Person
Data Person

Reputation: 45

How to use a for loop to create and fill columns?

I have a simple time series dataset with 10 variables - I want to create a for loop (or a function) that creates a 'change from previous month' variable and a 'percent change from previous month variable' for each variable in the timeseries (except for the date). I know I can simply write code for each particular column but would like to optimize this since there are a lot of columns.

Here is what my data looks like, "Date", "Sales", "Price" are some column names:

+----+---+---+---+---+---+---+---+--
| Date       |   Sales   |  Price  | 
+----+---+---+---+---+---+---+---+--
| 01Aug2019  | 4         | 15      |
| 01Sept2019 | 6         | 30      |
| 01Oct2019  | 10        | 44      |
+----+---+---+---+---+---+---+---+--

Here is what I want it to look like with the use of the for loop (or any function)

+----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| Date       |   Sales   |  chg_Sales  | pct_chg_Sales |   Price |  chg_Price  | pct_chg_Price| 
+----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 01Aug2019  | 4         | NA          |NA             |  15     | NA          |NA            |
| 01Sept2019 | 6         | 2           |50%            |  30     | 15          |100%          |
| 01Oct2019  | 10        | 4           |66%            |  44     | 14          |46%           |
+----+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

I tried the code below but it did not work

add_column <- function (x, y){
  setDT (x)[,pct_chg_y:= (y - shift (y,1, type="lag")/shift (,1, type="lag")*100]

}

Upvotes: 3

Views: 313

Answers (2)

M--
M--

Reputation: 29238

library(dplyr)
library(scales)

df1 %>% 
  arrange(Date) %>% 
  mutate_at(.vars = c("Sales", "Price"), list(chg = ~(. - lag(.)),
                                              pct_chg = ~percent((. - lag(.))/lag(.))))

  #         Date Sales Price Sales_chg Price_chg Sales_pct_chg Price_pct_chg
  # 1 2019-08-01     4    15        NA        NA           NA%           NA%
  # 2 2019-09-01     6    30         2        15         50.0%        100.0%
  # 3 2019-10-01    10    44         4        14         66.7%         46.7%

Upvotes: 1

akrun
akrun

Reputation: 887891

Here is an option with data.table where we specify the columns of interest in .SDcols, create the 'chg_' column by subtracting the .SD (Subset of Data.table) from the lag i.e. shift of .SD, then in the second step, create the 'pct_chg, by dividing the shift with the 'chg_' columns using Map

nm1 <- c("Sales", "Price")
setDT(df1)[,  paste0("chg_", nm1)  :=  .SD - shift(.SD), .SDcols = nm1]
df1[, paste0("pct_chg_", nm1) :=   
      Map(function(x, y)  100 * (y/shift(x)), .SD, mget(paste0("chg_", nm1))),
               .SDcols = nm1]
df1
#         Date Sales Price chg_Sales chg_Price pct_chg_Sales pct_chg_Price
#1:  01Aug2019     4    15        NA        NA            NA            NA
#2: 01Sept2019     6    30         2        15      50.00000     100.00000
#3:  01Oct2019    10    44         4        14      66.66667      46.66667

data

df1 <- structure(list(Date = c("01Aug2019", "01Sept2019", "01Oct2019"
), Sales = c(4, 6, 10), Price = c(15, 30, 44)), 
        class = "data.frame", row.names = c(NA, 
-3L))

Upvotes: 2

Related Questions