cousin_pete
cousin_pete

Reputation: 578

Using R dplyr throws error

I am learning to use the dplyr pkg.

library(dplyr)

A toy dataset:

d <- expand.grid("id"=1:3,"x1"=10:12,"x2"=(20:22))

Later I may need to loop through the columns, my real data has 30K rows, 70 columns

i <- 2

here I am hoping to use a generic variable name

my.variable <- names(d[i])
my.variable

A function to normalize each group to the range 0-1

norm <- function(x) (x - min(x,na.rm = TRUE))/(max(x,na.rm = TRUE)-min(x,na.rm = TRUE))

df.out <- d %>% group_by(id) %>% mutate(x.norm = norm(get(my.variable, envir = as.environment(d))))

throws an error:

Error: incompatible size (%d), expecting %d (the group size) or 1

Any help appreciated as to the reason for the error. Also, is this a viable way of doing this normalizing task?

Upvotes: 0

Views: 603

Answers (2)

James
James

Reputation: 66874

The problem comes from the use of get, which I'm sure is a breach of the @hadley license agreement ;)

To evaluate character arguments, you can use mutate_each_q. However, when using a single function, it will overwrite the variable, so you must use two functions and drop the second variable afterwards:

d %>% group_by(id) %>% mutate_each_q(funs(x.norm=norm, identity),my.variable) %>%
      select(-identity)
Source: local data frame [6 x 4]
Groups: id

  id x1 x2 x.norm
1  1 10 20    0.0
2  2 10 20    0.0
3  3 10 20    0.0
4  1 11 20    0.5
5  2 11 20    0.5
6  3 11 20    0.5
...

Upvotes: 2

Dieter Menne
Dieter Menne

Reputation: 10215

Don't know if you really want the columns as in @James' answer. Here as I understand your question:

d %>% group_by(id) %>% mutate_each(funs(norm(.)))

Groups: id

   id  x1  x2
1   1 0.0 0.0
2   2 0.0 0.0
3   3 0.0 0.0

...

Upvotes: 2

Related Questions