user2352714
user2352714

Reputation: 356

Error in trying to use the group_by and summarise functions with dplyr in R

I have been trying to summarise a set of data using dplyr in R. This is the code I have been using and it had been working fine up until recently.

library(tidverse);library(curl)
data<-read.csv(curl("https://raw.githubusercontent.com/megaraptor1/mydata/main/data.csv"))
data2<-data %>%
  group_by(e.taxon) %>%
  summarise(across(c(e.hbl,e.bm), weighted.mean, e.N), 
            N = sum(e.N))

"Error: Problem with summarise() input ..1. x 'x' and 'w' must have the same length i Input ..1 is (function (.cols = everything(), .fns = NULL, ..., .names = NULL) .... i The error occurred in group 2: e.taxon = "Abrocoma_bennettii"."

Now I know the purported reason for this error: two of the columns don't have the same length or have missing values. However, when I check to see which of the columns is producing the error, it says that all of the variables have the same number of entries (i.e., no missing data).

length(data$e.taxon)
length(data$e.hbl)
length(data$e.bm)
length(data$e.N)

I tried searching for this error message to see if there is more information behind it that I could use, but I could not find anything. What's really strange is this code was working fine until some unknown change, and due to the way the file is set up I cannot easily identify where the new changes are that might have produced this (the example is part of a larger shared dataset). I am trying to figure out why R is returning this error when all of the data have complete cases.

Upvotes: 0

Views: 1787

Answers (1)

akrun
akrun

Reputation: 887118

It does work with the new version of dplyr (1.0.6 tested on R 4.1.0)

library(dplyr)
data %>%
   group_by(e.taxon) %>%
   summarise(across(c(e.hbl,e.bm), weighted.mean, e.N), N = sum(e.N))
# A tibble: 2,004 x 4
   e.taxon               e.hbl    e.bm     N
   <chr>                 <dbl>   <dbl> <int>
 1 Abrawayomys_ruschii   126.     54.7     3
 2 Abrocoma_bennettii    190.    200       9
 3 Abrocoma_cinerea      149.     86.3     5
 4 Abrothrix_andinus      83.7    16.7    34
 5 Abrothrix_illuteus    121.     42      11
 6 Abrothrix_longipilis  105.     32.3    62
 7 Abrothrix_olivaceus    87.0    19.4    45
 8 Acinonyx_jubatus     1278.  52163.      7
 9 Acomys_cahirinus      105      41.1     2
10 Acomys_sp.             98.5    67       2
# … with 1,994 more rows

As we are passing arguments instead of a lambda function, it may be better to use the name i.e. w = e.N (though it wouldn't matter here as the second argument is w)

Upvotes: 2

Related Questions