steeles
steeles

Reputation: 169

aggregate sum of counts over factor in R

I am very new to R and this is my first stack overflow question so I expect this may be a little rough. I have a data frame (from a .csv) in the following structure:

FeatureName     Uuid     Count 

ClickHeadline   ABC1     17 
ChangeSetting   ABC1     3  
ClickHeadline   CBA2     5 
ChangeSetting   CBA2     7 
SomethingElse   CBA2     5

I am trying to figure out how to make a new data frame in which the unique values of FeatureName, the factors ClickHeadline, ChangeSetting, SomethingElse are now variables summing over the Count for each Uuid. So the new data frame I want would be:

Uuid    ClickHeadline    ChangeSetting    SomethingElse
ABC1    17               3                0
CBA2    5                7                5

I feel like I should be able to do this over the aggregate function, but I can't figure out how to tell it to look sum over the counts by a variable. I know I'm in way over my head but can anybody help me figure this out?

Upvotes: 0

Views: 1407

Answers (1)

cdeterman
cdeterman

Reputation: 19950

There are many possibilities

If you require a sum you could also use the reshape2 package dcast function

df <- read.table(header=T, text='
                 FeatureName     Uuid     Count 

ClickHeadline   ABC1     17 
ChangeSetting   ABC1     3  
ClickHeadline   CBA2     5 
ChangeSetting   CBA2     7 
SomethingElse   CBA2     5
                 ')

library(reshape2)
dcast(df, Uuid ~ FeatureName, value.var="Count", sum)

  Uuid ChangeSetting ClickHeadline SomethingElse
1 ABC1             3            17             0
2 CBA2             7             5             5

If you dataset is limited to the scope you provided you just can use the base reshape function

out <- reshape(df, idvar="Uuid", timevar="FeatureName", v.names="Count", direction="wide")
out[is.na(out)] = 0
out
  Uuid Count.ClickHeadline Count.ChangeSetting Count.SomethingElse
1 ABC1                  17                   3                   0
3 CBA2                   5                   7                   5

Another base R alternative is xtabs without need for removing NA

xtabs(Count ~ Uuid+FeatureName, df)
      FeatureName
Uuid   ChangeSetting ClickHeadline SomethingElse
  ABC1             3            17             0
  CBA2             7             5             5

tidyr package solution with spread

library(tidyr)
spread(df, key=FeatureName, value=Count, fill=0)
  Uuid ChangeSetting ClickHeadline SomethingElse
1 ABC1             3            17             0
2 CBA2             7             5             5

Upvotes: 1

Related Questions