user3206440
user3206440

Reputation: 5069

R - aggregate with formula

With a data frame like below

set.seed(100)
dfm <- data.frame(
id=sample(1:100, 6, replace = TRUE),
 val1 = rep(c("true", "false"), 3), 
val2=sample(c("true", "false"), 6, replace = TRUE))

  id  val1  val2
1 31  true false
2 26 false  true
3 56  true false
4  6 false  true
5 47  true false
6 49 false false

Need to aggregate by id, so that the result has occurrences of true per id . So I try the following

> aggregate(. ~ id, dfm, function(x) { length(x[x == "true"])})

  id val1 val2
1  6    0    0
2 26    0    0
3 31    0    0
4 47    0    0
5 49    0    0
6 56    0    0
> 

However this is not returning the count of "true" for each column.

Upvotes: 1

Views: 681

Answers (1)

akrun
akrun

Reputation: 887981

We can use rowsum

rowsum(+(dfm[-1]=="true"), dfm$id)

Regarding why the OP's code is not working, it is because of the factor 'val' columns. Use stringsAsFactors=FALSE in creating the 'dfm' and the OP's code should work. When the columns are factor, the aggregate gets the integer storage mode instead of the 'true/false' values resulting in all 0.

dfm <- data.frame(
  id=sample(1:100, 6, replace = TRUE),
  val1 = rep(c("true", "false"), 3), 
  val2=sample(c("true", "false"), 6, replace = TRUE), stringsAsFactors=FALSE)

aggregate(. ~ id, dfm, function(x) { length(x[x == "true"])})
#  id val1 val2
#1 21    1    0
#2 29    1    1
#3 36    0    0
#4 40    0    0
#5 67    0    0
#6 77    1    0

Upvotes: 2

Related Questions