Tom
Tom

Reputation: 2341

Taking the mean conditional on the value in another column for all rows

I have a data.table as follows:

library(data.table)  
DT <- fread(
    "A   B  C  D  E  F  iso   year   
     0   A   1   1  NA  NA  NLD   2009   
     1   Y   0   2  NA  NA  NLD   2009   
     0   Q   1   3  NA  NA  AUS   2011   
     1   NA  0   4  NA  NA  AUS   2011   
     0   0   1   7  NA  NA  NLD   2008   
     1   1   1   1  NA  NA  NLD   2008   
     0   1   1   3  NA  NA  AUS   2012   
     0   NA  1   NA  1  NA  ECU   2009   
     1   NA  0   NA  2  0   ECU   2009   
     0   NA  0   NA  3  0   BRA   2011   
     1   NA  0   4   4  0   BRA   2011   
     0   NA  1   NA  7  NA  ECU   2008   
     1   NA  0   5   1  0   ECU   2008   
     0   NA  0   NA  3  2   BRA   2012   
     1   NA  0   NA  4  NA  BRA   2012",
   header = TRUE
)

Now the following syntax:

setDT(DT)[ C != 1 , mean:= mean(D, na.rm=TRUE),  by=iso]

Takes the mean of D per iso for the rows in which C is not 1.

What I would like however, is the for mean calculation to be the same, but also filled in for the rows (by iso) for which C is 1.

Desired result:

     A  B   C   D   E   F   iso year mean
     0  A   1   1   NA  NA  NLD 2009 2
     1  Y   0   2   NA  NA  NLD 2009 2
     0  Q   1   3   NA  NA  AUS 2011 4
     1  NA  0   4   NA  NA  AUS 2011 4 
     0  0   1   7   NA  NA  NLD 2008 2
     1  1   1   1   NA  NA  NLD 2008 2
     0  1   1   3   NA  NA  AUS 2012 4
     0  NA  1   NA  1   NA  ECU 2009 5
     1  NA  0   NA  2   0   ECU 2009 5
     0  NA  0   NA  3   0   BRA 2011 4
     1  NA  0   4   4   0   BRA 2011 4
     0  NA  1   NA  7   NA  ECU 2008 5
     1  NA  0   5   1   0   ECU 2008 5
     0  NA  0   NA  3   2   BRA 2012 4
     1  NA  0   NA  4   NA  BRA 2012 4",

Upvotes: 0

Views: 30

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388797

We can subset D for values where C != 1 instead of filtering the rows.

library(data.table)
DT[ , mean := mean(D[C != 1], na.rm=TRUE), by=iso]
DT

#    A    B C  D  E  F iso year mean
# 1: 0    A 1  1 NA NA NLD 2009    2
# 2: 1    Y 0  2 NA NA NLD 2009    2
# 3: 0    Q 1  3 NA NA AUS 2011    4
# 4: 1 <NA> 0  4 NA NA AUS 2011    4
# 5: 0    0 1  7 NA NA NLD 2008    2
# 6: 1    1 1  1 NA NA NLD 2008    2
# 7: 0    1 1  3 NA NA AUS 2012    4
# 8: 0 <NA> 1 NA  1 NA ECU 2009    5
# 9: 1 <NA> 0 NA  2  0 ECU 2009    5
#10: 0 <NA> 0 NA  3  0 BRA 2011    4
#11: 1 <NA> 0  4  4  0 BRA 2011    4
#12: 0 <NA> 1 NA  7 NA ECU 2008    5
#13: 1 <NA> 0  5  1  0 ECU 2008    5
#14: 0 <NA> 0 NA  3  2 BRA 2012    4
#15: 1 <NA> 0 NA  4 NA BRA 2012    4

Using dplyr, that would be

library(dplyr)
DT %>% group_by(iso) %>% mutate(mean = mean(D[C!= 1], na.rm = TRUE))

Upvotes: 1

Related Questions