Reputation: 13
I'm trying to take a data.frame and aggregate the values of one column, sorted by values in other columns, and where the values in a final column are between certain criteria. In SQL I'd do a simple Group By and write a loop, but I'm just getting started in R and am having a hard time figuring out syntax. Basically I have a dataset that looks like this:
Type Type2 Bucket Value
A 1 1 1
A 2 1 2
A 3 1 1
A 4 1 3
A 5 1 1
A 1 2 1
A 2 2 2
A 3 2 1
A 4 2 3
I want the output to be something like this:
Type Type2 Bucket Value
A <4 1 4
A >=4 1 4
A <4 2 5
A >=4 2 3
In my head, it's easy, but I'm coming from a SQL background, and trying to do it in R. I've messed with some functions like split and ddply with bits of success, but can't quite put it all together. Thanks.
Upvotes: 1
Views: 1518
Reputation: 18681
You can do this with dplyr
. Assuming you have more than one Type
:
library(dplyr)
df %>%
group_by(Type, Bucket, Type2 = ifelse(Type2 < 4, "<4", ">=4")) %>%
summarize(Value = sum(Value)) %>%
select(Type, Type2, Bucket, Value)
Result:
# A tibble: 4 x 4
# Groups: Type, Bucket [2]
Type Type2 Bucket Value
<fctr> <chr> <int> <int>
1 A <4 1 4
2 A >=4 1 4
3 A <4 2 4
4 A >=4 2 3
Since you mentioned you have SQL background, here is a sqldf
solution for you:
library(sqldf)
sqldf("select Type,
case when Type2 < 4 then '<4' else '>=4' end as Type_2,
Bucket,
sum(Value) as Value
from df
group by Type, Bucket, Type_2")
Result:
Type Type_2 Bucket Value
1 A <4 1 4
2 A >=4 1 4
3 A <4 2 4
4 A >=4 2 3
Data:
df = structure(list(Type = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = "A", class = "factor"), Type2 = c(1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L), Bucket = c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L), Value = c(1L, 2L, 1L, 3L, 1L, 1L, 2L, 1L, 3L)), .Names = c("Type",
"Type2", "Bucket", "Value"), class = "data.frame", row.names = c(NA,
-9L))
Upvotes: 1