MeC
MeC

Reputation: 463

How to interpret column length error from ddplyr::mutate?

I'm trying to apply a function (more complex than the one used below, but I was trying to simplify) across two vectors. However, I receive the following error:

    mutate_impl(.data, dots) : 
       Column `diff` must be length 2 (the group size) or one, not 777

I think I may be getting this error because the difference between rows results in one row less than the original dataframe, per some posts that I read. However, when I followed that advice and tried to add a vector to add 0/NA on the final line I received another error. Did I at least identify the source of the error correctly? Ideas? Thank you.

Original code:

     diff_df <- DF %>%
       group_by(DF$var1, DF$var2) %>%
       mutate(diff = map2(DF$duration, lead(DF$duration), `-`)) %>%
       as.data.frame()

Upvotes: 1

Views: 449

Answers (1)

akrun
akrun

Reputation: 887128

We don't need map2 to get the difference between the 'duration' and the lead of 'duration'. It is vectorized. map2 will loop through each element of 'duration' with the corresponding element of lead(duration) which is unnecessary

DF %>% 
    group_by(var1, var2) %>% 
    mutate(diff = duration - lead(duration))

NOTE: When we extract the column with DF$duration after the group_by. it is breaking the grouping condition and get the full dataset column. Also, in the pipe, there is no need for dataset$columnname. It should be columnname (However,in certain situations, when we want to get the full column for some comparison - it can be used)

Upvotes: 1

Related Questions