antonina
antonina

Reputation: 109

substracting value from value above automatically - no loop

I am trying to get a balance value from a df that looks like this

df1
Name   Year    Ch1    Origin
A      1995    x1      a
A      1996    x2      b
A      1997    x3      a
A      2000    x4      a
B      1997    y1      c
B      1998    y2      c
.....

while Ch1 is numerical. and I want to add an extra col to have this value:

Name   Year   Ch1    Bil
A      1995    x1    
A      1996    x2    %of year before (x2-x1/x1*100)%
A      1997    x3    %of year before (x3-x2/x2*100)%
A      2000    x4    %of year before (x4-x3/x3*100)%
B      1997    y1  
B      1998    y2    %of year before (y2-y1/x1*100)%
.....

now I know I could create a loop looking something like this:

for (i in nrow(df1))
  if (df[i,1]==df[i-1,1]) {
    df$Bil<-(df[i,3]-df[i-1,3])/df[i-1,3]*100
  } else ...

Is there a more elegant or quicker way to calculate this? This way I really need to make sure that the dataset is in the right order (going from older to recent years). Lets say also dependant on an extra detail such as origin so that the calculation only happens if name and origin are the same?

Thank you!

Upvotes: 1

Views: 46

Answers (1)

r2evans
r2evans

Reputation: 160447

All three solutions require the data be in the correct order within each Name. For instance, you can interleave names of "A" and "B" (all groupings below will handle it), but the years should likely be non-decreasing.

Prepping by putting in real Ch1:

set.seed(42)
df1$Ch1 <- c(sort(sample(20, size=4)), sort(sample(20, size=2)))

Base R

df1 <- df1[order(df1$Name, df1$Year),]
df1$Bil <- ave(df1$Ch1, df1$Name, df1$Origin,
               FUN=function(z) 100 * c(0, diff(z) / head(z, n = -1)))
df1
#   Name Year Ch1 Origin        Bil
# 1    A 1995   6      a   0.000000
# 2    A 1996  15      b   0.000000
# 3    A 1997  18      a 200.000000
# 4    A 2000  19      a   5.555556
# 5    B 1997  10      c   0.000000
# 6    B 1998  13      c  30.000000

dplyr

library(dplyr)
df1 %>%
  arrange(Name, Year) %>%
  group_by(Name, Origin) %>%
  mutate(
    Bil = 100 * c(0, diff(Ch1) / head(Ch1, n = -1))
  ) %>%
  ungroup()

data.table

library(data.table)
library(magrittr)
df1DT <- as.data.table(df1)
setorder(df1DT, Name, Year)
df1DT[, Bil := 100 * c(0, diff(Ch1) / head(Ch1, n = -1)), by = .(Name, Origin)]

Upvotes: 2

Related Questions