pdubois
pdubois

Reputation: 7790

Conditionally count rows based on comparing values between column

I have the following data frame:


df <- structure(list(x = c(0.389794300700167, -1.20807617542949, -0.363676017470862, 
-1.62667268170309, -0.256478394123992, 1.10177950308713, 0.755781508027337, 
-0.238233556018718, 0.98744470341339, 0.741390128383824), y = c(0.0893472664958216, 
-0.954943856152377, -0.195150384667239, 0.92552126209408, 0.482978524836611, 
-0.596310636720207, -2.18528683816953, -0.674865937875116, -2.11906119191017, 
-1.2651980215309), fac = structure(c(2L, 1L, 2L, 3L, 1L, 1L, 
1L, 1L, 2L, 2L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("x", 
"y", "fac"), row.names = c(NA, -10L), class = "data.frame")


df
#>             x           y fac     manual_assignment
#> 1   0.3897943  0.08934727   B     b.x
#> 2  -1.2080762 -0.95494386   A     a.y
#> 3  -0.3636760 -0.19515038   B     b.y
#> 4  -1.6266727  0.92552126   C     c.y
#> 5  -0.2564784  0.48297852   A     a.y
#> 6   1.1017795 -0.59631064   A     a.x
#> 7   0.7557815 -2.18528684   A     a.x
#> 8  -0.2382336 -0.67486594   A     a.x
#> 9   0.9874447 -2.11906119   B     b.x
#> 10  0.7413901 -1.26519802   B     b.x

What I want to do is to count rows based on comparing values in x and y. For every row, if the value x is larger than y, we increase the count of fac member by 1. So the end result is this:

      x.count   y.count
A        3         2 (# a.y)
B        3         1
C        0         1

How can I achieve that? Possible with dplyr?

Upvotes: 2

Views: 84

Answers (3)

akrun
akrun

Reputation: 886938

With data.table, we can use

library(data.table)
dcast(setDT(df)[, .N, .(fac, measure = c('y.count', 'x.count')[(x > y) + 1])], 
           fac ~measure, fill = 0)
#    fac x.count y.count
#1:   A       3       2
#2:   B       3       1
#3:   C       0       1

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388797

With dplyr we can group by fac and count the length of values where x and y values are greater.

library(dplyr)
df %>%
   group_by(fac) %>%
   summarise(x.count = length(which(x > y)), 
             y.count = length(which(x < y)))


#     fac x.count y.count
#  <fctr>   <int>   <int>
#1      A       3       2
#2      B       3       1
#3      C       0       1

Upvotes: 1

akuiper
akuiper

Reputation: 214927

This is more straight forward with table:

with(df, table(fac, ifelse(x > y, "x.count", "y.count")))       

#fac x.count y.count
#  A       3       2
#  B       3       1
#  C       0       1

With dplyr/tidyr, you need a few more lines of code:

library(tidyverse)
df %>% 
    group_by(fac, measure = if_else(x > y, "x.count", "y.count")) %>% 
    tally() %>% 
    spread(measure, n, fill = 0)

#Source: local data frame [3 x 3]
#Groups: fac [3]

#     fac x.count y.count
#* <fctr>   <dbl>   <dbl>
#1      A       3       2
#2      B       3       1
#3      C       0       1

Upvotes: 2

Related Questions