Nasher
Nasher

Reputation: 7

how to create a new column where the rows are determined by the previous row (calculation)?

I have this table and I want to calculate inflation for every month in every city

I tried the lag and lead functions but the CPI was overlapping onto the next city for ex : Malmo's shifted CPI was Stockholm's 05-01-2005 CPI

CPI Date City
112 2005-01-01 Stockholm
113.5 2005-02-01 Stockholm
115 2005-03-01 Stockholm
115.6 2005-04-01 Stockholm
115.8 2005-05-01 Stockholm
106 2005-01-01 Malmo
107.5 2005--02-01 Malmo
110 2005-03-01 Malmo
113 2005-04-01 Malmo
117 2005-05-01 Malmo

Upvotes: 0

Views: 62

Answers (2)

B. Christian Kamgang
B. Christian Kamgang

Reputation: 6489

Another way to solve your problem:

Using data.table package

library(data.table)

setDT(df)[, CPI_change_perc := (CPI/shift(CPI) - 1) * 100, by=City]

      CPI        Date      City CPI_change_perc
 1: 112.0  2005-01-01 Stockholm              NA
 2: 113.5  2005-02-01 Stockholm       1.3392857
 3: 115.0  2005-03-01 Stockholm       1.3215859
 4: 115.6  2005-04-01 Stockholm       0.5217391
 5: 115.8  2005-05-01 Stockholm       0.1730104
 6: 106.0  2005-01-01     Malmo              NA
 7: 107.5 2005--02-01     Malmo       1.4150943
 8: 110.0  2005-03-01     Malmo       2.3255814
 9: 113.0  2005-04-01     Malmo       2.7272727
10: 117.0  2005-05-01     Malmo       3.5398230

Using dplyr package

library(dplyr)

df |> 
  group_by(City) |> 
  mutate(CPI_change_perc = (CPI/shift(CPI) - 1) * 100)

# A tibble: 10 × 4
# Groups:   City [2]
     CPI Date        City      CPI_change_perc
   <dbl> <chr>       <chr>               <dbl>
 1  112  2005-01-01  Stockholm          NA    
 2  114. 2005-02-01  Stockholm           1.34 
 3  115  2005-03-01  Stockholm           1.32 
 4  116. 2005-04-01  Stockholm           0.522
 5  116. 2005-05-01  Stockholm           0.173
 6  106  2005-01-01  Malmo              NA    
 7  108. 2005--02-01 Malmo               1.42 
 8  110  2005-03-01  Malmo               2.33 
 9  113  2005-04-01  Malmo               2.73 
10  117  2005-05-01  Malmo               3.54 

data

df = structure(list(CPI = c(112, 113.5, 115, 115.6, 115.8, 106, 107.5, 
                            110, 113, 117), Date = c("2005-01-01", "2005-02-01", "2005-03-01", 
                                                     "2005-04-01", "2005-05-01", "2005-01-01", "2005--02-01", "2005-03-01", 
                                                     "2005-04-01", "2005-05-01"), City = c("Stockholm", "Stockholm", 
                                                                                           "Stockholm", "Stockholm", "Stockholm", "Malmo", "Malmo", "Malmo", 
                                                                                           "Malmo", "Malmo")), row.names = c(NA, -10L), class = "data.frame")

Upvotes: 0

SamR
SamR

Reputation: 20405

You need to group_by to avoid this issue.

# Read in data
inflation  <- read.table(text = "CPI    Date    City
112 2005-01-01  Stockholm
113.5   2005-02-01  Stockholm
115 2005-03-01  Stockholm
115.6   2005-04-01  Stockholm
115.8   2005-05-01  Stockholm
106 2005-01-01  Malmo
107.5   2005--02-01 Malmo
110 2005-03-01  Malmo
113 2005-04-01  Malmo
117 2005-05-01  Malmo", h = T)

# Perform calculation
library(dplyr)

inflation  |>
    group_by(City)  |>
    mutate(
        cpi_change = lead(CPI) - CPI,
        cpi_change_percent = cpi_change / CPI * 100
    )

Output:

# A tibble: 10 x 5
# # Groups:   City [2]
#      CPI Date        City      cpi_change cpi_change_percent
#    <dbl> <chr>       <chr>          <dbl>              <dbl>
#  1  112  2005-01-01  Stockholm      1.5                1.34
#  2  114. 2005-02-01  Stockholm      1.5                1.32
#  3  115  2005-03-01  Stockholm      0.600              0.522
#  4  116. 2005-04-01  Stockholm      0.200              0.173
#  5  116. 2005-05-01  Stockholm     NA                 NA
#  6  106  2005-01-01  Malmo          1.5                1.42
#  7  108. 2005--02-01 Malmo          2.5                2.33
#  8  110  2005-03-01  Malmo          3                  2.73
#  9  113  2005-04-01  Malmo          4                  3.54
# 10  117  2005-05-01  Malmo         NA                 NA

You will get NAs for the last month as we do not know the rate in the following year. Alternatively you can do it with lag instead of lead if you want to work out change from previous, but then you'll get NAs for the first month.

Upvotes: 1

Related Questions