JerryWho
JerryWho

Reputation: 3105

custom function after grouping data.fame

Given the following data.frame

d <- rep(c("a", "b"), each=5)
l <- rep(1:5, 2) 
v <- 1:10

df       <- data.frame(d=d, l=l, v=v*v)
df
   d l   v
1  a 1   1
2  a 2   4
3  a 3   9
4  a 4  16
5  a 5  25
6  b 1  36
7  b 2  49
8  b 3  64
9  b 4  81
10 b 5 100

Now I want to add another column after grouping by l. The extra column should contain the value of v_b - v_a

   d l   v    e
1  a 1   1    35 (36-1)
2  a 2   4    45 (49-4)
3  a 3   9    55 (64-9)
4  a 4  16    65 (81-16)
5  a 5  25    75 (100-25)
6  b 1  36    35 (36-1)
7  b 2  49    45 (49-4)
8  b 3  64    55 (64-9)
9  b 4  81    65 (81-16)
10 b 5 100    75 (100-25)

In paranthesis the way how to calculate the value.

I'm looking for a way using dplyr. So I started with something like this

df %.% 
 group_by(l) %.%
 mutate(e=myCustomFunction)

But how should I define myCustomFunction? I thought grouping of the data.frame produces another (sub-)data.frame which is a parameter to this function. But it isn't...

Upvotes: 6

Views: 9813

Answers (4)

talat
talat

Reputation: 70256

I guess this is the dplyr equivalent to @jlhoward's data.table solution:

df %>%
  group_by(l) %>%
  mutate(e = v[d == "b"] - v[d == "a"])

Edit after comment by OP:

If you want to use a custom function, here's a possible way:

myfunc <- function(x) {
  with(x, v[d == "b"] - v[d == "a"])
}

test %>%
  group_by(l) %>%
  do(data.frame(. , e = myfunc(.))) %>%
  arrange(d, l)                   # <- just to get it back in the original order

Edit after comment by @hadley:

As hadley commented below, it would be better in this case to define the function as

f <- function(v, d) v[d == "b"] - v[d == "a"]

and then use the custom function f inside a mutate:

df %>%
  group_by(l) %>%
  mutate(e = f(v, d))  

Thanks @hadley for the comment.

Upvotes: 15

jlhoward
jlhoward

Reputation: 59345

Here's an approach using data tables.

library(data.table)
DT <- as.data.table(df)
DT[,e := diff(v), by=l]

These approaches using diff(...) assume your data frame is sorted as in your example. If not, this is a more reliable way to do the same thing.

DT[, e := .SD[d == "b", v] - .SD[d == "a", v], by=l]

(or) even more directly

DT[, e := v[d == "b"] - v[d == "a"], by=l]

But if you want to access the entire subset of data and pass it to your custom function, then you can use .SD. Also make sure you read about ?.SDcols from ?data.table.

Upvotes: 4

agstudy
agstudy

Reputation: 121568

Using dplyr:

df %.%   
  group_by(l)  %.%
  mutate(e=diff(v))

# d l   v  e
# 1  a 1   1 35
# 2  a 2   4 45
# 3  a 3   9 55
# 4  a 4  16 65
# 5  a 5  25 75
# 6  b 1  36 35
# 7  b 2  49 45
# 8  b 3  64 55
# 9  b 4  81 65
# 10 b 5 100 75

Upvotes: 4

MrFlick
MrFlick

Reputation: 206187

If you want to consider a non-dplyr option

df$e <- with(df, ave(v, l, FUN=function(x) diff(x)))

will do the trick. The ave function is useful for calculating values for groups of observations.

Upvotes: 1

Related Questions