Reputation: 2003
I've got a dataframe consisting of a group and 2 value columns, as such:
group val1 val2
A 5 3
A 2 4
A 3 1
B 3 6
B 2 1
B 0 2
I want to work out the number of rows where val1 > val2, split by subset. Initially I hardcoded this per subgroup with:
number_a <- nrow(subset(df, group=="A" & val1 > val2))
number_b <- nrow(subset(df, group=="B" & val1 > val2))
What's the proper way of automating this? I tried using the split()
function but I couldn't work out how to pass in both val1
and val2
column.
Upvotes: 2
Views: 334
Reputation: 417
You can try this
data <- data.frame(group,val1,val2)
attach(data)
aggregate(val1~group,data[which(val1 > val2),],length)
Upvotes: 1
Reputation: 92282
Pretty straight forward using data.table
If you want the number of rows
library(data.table)
setDT(df)[, .(RowsNum = sum(val1 > val2)), by = group]
# group RowsNum
# 1: A 2
# 2: B 1
If you looking for split
, apply
combinations in base R, could also try
sapply(split(df[-1], df[1]), function(x) sum(x[1] > x[2]))
# A B
# 2 1
Or using tapply
(also from base R)
tapply(with(df, val1 > val2), df[1], sum)
# group
# A B
# 2 1
If you want the rows themselves
setDT(df)[, .SD[val1 > val2]]
# group val1 val2
# 1: A 5 3
# 2: A 3 1
# 3: B 2 1
Or very simple with base R too
df[with(df, val1 > val2), ]
# group val1 val2
# 1 A 5 3
# 3 A 3 1
# 5 B 2 1
Or
subset(df, val1 > val2)
# group val1 val2
# 1 A 5 3
# 3 A 3 1
# 5 B 2 1
Upvotes: 3
Reputation: 887078
Another option using dplyr
library(dplyr)
filter(df, val1 >val2)
# group val1 val2
#1 A 5 3
#2 A 3 1
#3 B 2 1
If you need the nrows
df %>%
group_by(group) %>%
filter(val1 >val2) %>%
summarise(RowsNum=n())
# group RowsNum
#1 A 2
#2 B 1
Or using aggregate
from base R
aggregate(cbind(RowsNum = val1 > val2) ~ group, df, sum)
# group RowsNum
#1 A 2
#2 B 1
Upvotes: 2