tjr
tjr

Reputation: 691

dplyr group_by only some values

I have a data frame, df like this...df = data.frame(w = c('CT','CT','CT','CT','CT','CT'), x = c('PF','PF','MF','MF','AF','AF'), y = sample(letters, 6), z = seq(1:6)) It is already grouped by w and y. I want to make a new grouping by x, but only if x = PF or MF. I need to keep y if x = AF, otherwise NA or some other unique number would be ok. The summarize function would be the sum of z so the final data frame would be...

w  x  y  z 
CT PF NA 3
CT MF NA 7
CT AF s 5
CT AF h 6

I am using dplyr and tried to group_by (Flyway %in% c('MF','PF')) but that only gets a new column with TRUE/FALSE. Maybe I should be looking outside dplyr? Thanks.

Upvotes: 1

Views: 1353

Answers (2)

akrun
akrun

Reputation: 887851

We could also use data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), for values in 'x' that are not 'AF', assign (:=) the 'y' to 'NA', grouped by 'w', 'x', and 'y', we get the sum of 'z'.

library(data.table)
setDT(df)[x!='AF', y:=NA_character_][,list(z=sum(z)) ,.(w,x,y)]
#    w  x  y z
#1: CT PF NA 3
#2: CT MF NA 7
#3: CT AF  b 5
#4: CT AF  o 6

NOTE: The different values in 'y' column is due to not setting the seed while constructing the dataset.

Upvotes: 1

talat
talat

Reputation: 70336

You could change y first, then group the data and compute the sum of z:

df %>% 
  ungroup %>% 
  mutate(y = replace(y, x != "AF", NA)) %>% 
  group_by(w, x, y) %>% 
  summarise(z = sum(z)) %>% 
  ungroup()
#Source: local data frame [4 x 4]
#
#       w      x      y     z
#  (fctr) (fctr) (fctr) (int)
#1     CT     AF      h     5
#2     CT     AF      l     6
#3     CT     MF     NA     7
#4     CT     PF     NA     3

Or a little shorter

df %>% 
  group_by(w, x, y = replace(y, x != "AF", NA)) %>% 
  summarise(z = sum(z)) %>% 
  ungroup()

Upvotes: 3

Related Questions