Esme_
Esme_

Reputation: 1520

Apply function within each subset of a dataframe

I have a dataframe and need to calculate the difference between successive entries within each ID, but would like to do this without having to create individual dataframes for each ID and then join back together (my current solution). Here is an example using a similar structure to the dataframes.

df = as.data.frame(matrix(nrow = 20,ncol =2 ))
names(df) = c("ID","number")
df$ID = sample(c("A","B","C"),20,replace = T)
df$number = rnorm(20,mean = 5)

I can easily calculate the difference between successive rows using this function

roll.dif <-function(x) {
 difference = rollapply(x,width = 2, diff, fill=NA, align = "right")
 return(difference)
}

df$dif = roll.dif(df$number)

however I would like to do this within each ID. I have tried using with based on this answer Apply function conditionally as

with(df, tapply(number, ID, FUN = roll.dif))

I have also tried using by

by(df$number,df$ID,FUN = roll.dif)

both of which give me the answers I am looking for, but I cannot figure out how to get them back into the dataframe. I would like the output to look like this:

    ID  number       dif
 1  A   3.967251     NA
 2  B   3.771882     NA
 3  A   5.920705     1.953454
 4  A   7.517528     1.596823
 5  B   5.252357     3.771882
 6  B   4.811998    -0.440359
 7  B   3.388951    -1.423047
 8  A   5.284527    -2.233001
 9  C   6.070546     NA
 10 A   5.319934     0.035407
 11 A   5.517615     0.197681
 12 B   5.454738     2.065787
 13 C   6.402359     0.331813
 14 C   5.617123    -0.785236
 15 A   5.692807     0.175192
 16 C   4.902007    -0.715116
 17 B   4.975184    -0.479554
 18 A   6.05282      0.360013
 19 C   3.677114    -1.224893
 20 C   4.883414     1.2063

Upvotes: 0

Views: 78

Answers (2)

akrun
akrun

Reputation: 886968

We can use data.table

library(data.table)
setDT(df)[, dif := roll.dif(number), by = ID]

Or a base R option is ave

df$dif <- with(df, ave(number, ID, FUN = roll.dif))

Upvotes: 1

user2100721
user2100721

Reputation: 3587

You can use dplyr package like this

df %>% group_by(ID) %>% mutate(dif=roll.dif(number))

Upvotes: 2

Related Questions