Jason
Jason

Reputation: 59

Replacing the last value within groups with different values

My question is similar to this post, but the difference is instead of replacing the last value within each group/id with all 0's, different values are used to replace the last value within each group/id.

Here is an example (I borrowed it from the above link):

          id  Time
1         1    3
2         1    10
3         1    1
4         1    0
5         1    9999
6         2    0
7         2    9
8         2    500
9         3    0
10        3    1

In the above link, the last value within each group/id was replaced by a zero, using something like:

df %>%
  group_by(id) %>%
  mutate(Time = c(Time[-n()], 0))

And the output was

          id  Time
1         1    3
2         1    10
3         1    1
4         1    0
5         1    0
6         2    0
7         2    9
8         2    0
9         3    0
10        3    0

In my case, I would like the last value within each group/id to be replaced by a different value. Originally, the last value within each group/id was 9999, 500, and 1. Now I would like: 9999 is replaced by 5, 500 is replaced by 12, and 1 is replaced by 92. The desired output is:

          id  Time
1         1    3
2         1    10
3         1    1
4         1    0
5         1    5
6         2    0
7         2    9
8         2    12
9         3    0
10        3    92

I tried this one:

df %>%
  group_by(id) %>%
  mutate(Time = replace(Time, n(), c(5,12,92))),

but it did not work.

Upvotes: 0

Views: 1652

Answers (2)

Arun
Arun

Reputation: 118779

Another way using data.table would be to create another data.table which contains the values to be replaced with for a given id, and then join and update by reference (simultaneously).

require(data.table) # v1.9.5+ (for 'on = ' feature)
replace = data.table(id = 1:3, val = c(5L, 12L, 9L)) # from @David
setDT(df)[replace, Time := val, on = "id", mult = "last"]

#     id Time
#  1:  1    3
#  2:  1   10
#  3:  1    1
#  4:  1    0
#  5:  1    5
#  6:  2    0
#  7:  2    9
#  8:  2   12
#  9:  3    0
# 10:  3    9

In data.table, joins are considered as an extension of subsets. It's natural to think of doing whatever operation we do on subsets also on joins. Both operations do something on some rows.

For each replace$id, we find the last matching row (mult = "last") in df$id, and update that row with the corresponding val.

Installation instructions for v1.9.5 here. Hope this helps.

Upvotes: 3

David Arenburg
David Arenburg

Reputation: 92282

This could be solved using almost identical solution as I posted in the linked question. e.g., just replace 0L with the desired values

library(data.table)
indx <- setDT(df)[, .I[.N], by = id]$V1
df[indx, Time := c(5L, 12L, 92L)]
df
#     id Time
#  1:  1    3
#  2:  1   10
#  3:  1    1
#  4:  1    0
#  5:  1    5
#  6:  2    0
#  7:  2    9
#  8:  2   12
#  9:  3    0
# 10:  3   92

So to add some explanations:

  1. .I is identical to row_number() or 1:n() in dplyr for an ungrouped data, e.g. 1:nrow(df) in base R
  2. .N is like n() in dplyr, e.g., the size of a certain group (or the whole data set). So basically when I run .I[.N] by group, I'm retrieving the global index of the last row of each group
  3. The next step is just use this index as a row index within df while assigning the desired values to Time by reference using the := operator.

Edit

Per OPs request, here's a possible dplyr solution. Your original solution doesn't work because you are working per group and thus you were trying to pass all three values to each group.

The only way I can think of is to first calculate group sizes, then ungroup and then mutate on the cumulative sum of these locations, something among these lines

library(dplyr)
df %>%
  group_by(id) %>%
  mutate(indx = n()) %>%
  ungroup() %>%
  mutate(Time = replace(Time, cumsum(unique(indx)), c(5, 12, 92))) %>%
  select(-indx)

# Source: local data frame [10 x 2]
# 
#    id Time
# 1   1    3
# 2   1   10
# 3   1    1
# 4   1    0
# 5   1    5
# 6   2    0
# 7   2    9
# 8   2   12
# 9   3    0
# 10  3   92

Upvotes: 3

Related Questions