Megan
Megan

Reputation: 31

Finding difference between specific rows by group

Within a group, I want to find the difference between that row and the first time that user appeared in the data. For example, I need to create the diff variable below. Users have different number of rows each as in the following data:

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L), 
    money = c(9L, 12L, 13L, 15L, 5L, 7L, 8L, 5L, 2L, 10L), occurence = c(1L, 
    2L, 3L, 4L, 1L, 2L, 3L, 1L, 1L, 2L), diff = c(NA, 3L, 4L, 
    6L, NA, 2L, 3L, NA, NA, 8L)), .Names = c("ID", "money", "occurence", 
"diff"), class = "data.frame", row.names = c(NA, -10L))

   ID money occurence diff
1   1     9         1   NA
2   1    12         2    3
3   1    13         3    4
4   1    15         4    6
5   2     5         1   NA
6   2     7         2    2
7   2     8         3    3
8   3     5         1   NA
9   4     2         1   NA
10  4    10         2    8

Upvotes: 0

Views: 68

Answers (2)

www
www

Reputation: 39154

A solution, which uses the first function to get the first value and calculate the difference.

library(dplyr)

df2 <- df %>%
  group_by(ID) %>%
  mutate(diff = money - first(money)) %>%
  mutate(diff = replace(diff, diff == 0, NA)) %>%
  ungroup()
df2
# # A tibble: 10 x 4
#       ID money occurence  diff
#    <int> <int>     <int> <int>
#  1     1     9         1    NA
#  2     1    12         2     3
#  3     1    13         3     4
#  4     1    15         4     6
#  5     2     5         1    NA
#  6     2     7         2     2
#  7     2     8         3     3
#  8     3     5         1    NA
#  9     4     2         1    NA
# 10     4    10         2     8

Update

Here is a solution provided by Sotos. Notice that no need to replace 0 with NA.

library(data.table)

setDT(df)[, money := money - first(money), by = ID][]
#     ID money occurence diff
#  1:  1     0         1   NA
#  2:  1     3         2    3
#  3:  1     4         3    4
#  4:  1     6         4    6
#  5:  2     0         1   NA
#  6:  2     2         2    2
#  7:  2     3         3    3
#  8:  3     0         1   NA
#  9:  4     0         1   NA
# 10:  4     8         2    8

DATA

dput(df)
structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L), 
    money = c(9L, 12L, 13L, 15L, 5L, 7L, 8L, 5L, 2L, 10L), occurence = c(1L, 
    2L, 3L, 4L, 1L, 2L, 3L, 1L, 1L, 2L)), .Names = c("ID", "money", 
"occurence"), row.names = c(NA, -10L), class = "data.frame")

Upvotes: 1

Rich Scriven
Rich Scriven

Reputation: 99321

You can use ave(). We just remove the first value per group and replace it with NA, and subtract the first value from the rest of the values.

with(df, ave(money, ID, FUN = function(x) c(NA, x[-1] - x[1])))
# [1] NA  3  4  6 NA  2  3 NA NA  8

Upvotes: 3

Related Questions