R Aggregate multiple rows

Question

My question seems to be a very common question, but the solutions I found on internet don't work...

I would like to aggregate rows in a data frame in R. Here is the structure of my data frame (df), it is a table of citations :

Autors      Lannoy_2016  Ramadier_2014  Lord_2009  Ortar_2008
Burgess E             1             NA         NA          NA
Burgess E             1             NA         NA          NA
Burgess E             1             NA         NA          NA
Burgess E             1             NA         NA          NA
Kaufmann V           NA              1         NA          NA
Kaufmann V           NA             NA          1          NA
Kaufmann V           NA             NA         NA           1
Orfeuil P             1             NA         NA          NA
Orfeuil P            NA              1         NA          NA
Sorokin P            NA             NA         NA           1

That is I would like to have :

Autors      Lannoy_2016  Ramadier_2014  Lord_2009  Ortar_2008
Burgess E             4             NA         NA          NA
Kaufmann V           NA              1          1           1
Orfeuil P             1              1         NA          NA
Sorokin P            NA             NA         NA           1

I have tried those solutions, but it doesn't work :

ddply(df,"Autors", numcolwise(sum))

and

df %>% group_by(Autors) %>% summarize_all(sum)

It aggregates well the rows, but the values (sum of the 1 values) are absolutely not correct ! And I don't understand why...

Do you have an idea ?

Thank you very much !

Joël

akrun · Accepted Answer

It could be because the na.rm is not used

library(dplyr)
df %>%
   group_by(Autors) %>% 
   summarize_all(sum, na.rm = TRUE)

if both plyr and dplyr are loaded, summarise would get masked, but doubt about summarise_all as it is a dplyr function

Based on the expected output, with na.rm = TRUE, it removes all NAs and if there are cases having only NAs it returns 0. To avoid that, we can have a condition

df %>% 
   group_by(Autors) %>% 
   summarize_all(funs(if(all(is.na(.))) NA else sum(., na.rm = TRUE)))
# A tibble: 4 x 5
#  Autors     Lannoy_2016 Ramadier_2014 Lord_2009 Ortar_2008
#                                  
#1 Burgess E            4            NA        NA         NA
#2 Kaufmann V          NA             1         1          1
#3 Orfeuil P            1             1        NA         NA
#4 Sorokin P           NA            NA        NA          1

data

df <- structure(list(Autors = c("Burgess E", "Burgess E", "Burgess E", 
"Burgess E", "Kaufmann V", "Kaufmann V", "Kaufmann V", "Orfeuil P", 
"Orfeuil P", "Sorokin P"), Lannoy_2016 = c(1L, 1L, 1L, 1L, NA, 
 NA, NA, 1L, NA, NA), Ramadier_2014 = c(NA, NA, NA, NA, 1L, NA, 
 NA, NA, 1L, NA), Lord_2009 = c(NA, NA, NA, NA, NA, 1L, NA, NA, 
 NA, NA), Ortar_2008 = c(NA, NA, NA, NA, NA, NA, 1L, NA, NA, 1L
 )), .Names = c("Autors", "Lannoy_2016", "Ramadier_2014", "Lord_2009", 
 "Ortar_2008"), class = "data.frame", row.names = c(NA, -10L))

R Aggregate multiple rows

Answers (2)

data

Related Questions