Adam Adams
Adam Adams

Reputation: 57

Data aggregate in r

I have a data set (test)

v1 v2  v3  v4  v5  v6
1   1   1   0   0   0 
2   2   1   1   0   0 
3   2   1   0   0   0 
4   3   1   0   0   0 
5   3   1   1   0   1 
6   3   1   0   1   1 

structure(list(V1 = 1:6, V2 = c(1L, 2L, 2L, 3L, 3L, 3L), V3 = c(1L, 
1L, 1L, 1L, 1L, 1L), V4 = c(0L, 1L, 0L, 0L, 1L, 0L), V5 = c(0L, 
0L, 0L, 0L, 0L, 1L), V6 = c(0L, 0L, 0L, 0L, 1L, 1L)), .Names = c("V1", 
"V2", "V3", "V4", "V5", "V6"), class = "data.frame", row.names = c(NA, 
-6L))

and I want to achieve this

v1  v2  v3  v4  v5  v6
 1   1   1   0   0   0  
 5   2   2   1   0   0  
15   3   3   1   1   2  

I have tried this:

aggregate(test[c('v3', 'v4', 'v5','v6')], list('v2'), FUN=sum, na.rm=TRUE)

which is not working. I want to aggregate the data in (test) based on V2 and sum the other variables.

Upvotes: 1

Views: 308

Answers (2)

CHP
CHP

Reputation: 17189

Your initial attempt was almost correct. With minor correction you can achieve what you want. This assumes you want to aggregate-sum rows by V2

result <- aggregate(test[,c('V1', 'V3', 'V4', 'V5','V6')], list(test[,'V2']), FUN=sum, na.rm=TRUE)

> names(result) <- gsub("Group.1", "V2", names(result))
> result
  V2 V1 V3 V4 V5 V6
1  1  1  1  0  0  0
2  2  5  2  1  0  0
3  3 15  3  1  1  2

Upvotes: 1

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193667

Change your aggregate command to:

aggregate(. ~ V2, test, sum)
#   V2 V1 V3 V4 V5 V6
# 1  1  1  1  0  0  0
# 2  2  5  2  1  0  0
# 3  3 15  3  1  1  2

Some things to note:

  1. R is case sensitive. The sample data you provided has variables named with upper-case "V"s, but the sample code you've tried has lower-case "v"s.
  2. You're trying to refer to the variable names directly. For that, you either need to be using the formula notation for aggregate() or you need to be using with() or (not recommended) attach().

Upvotes: 6

Related Questions