corrado
corrado

Reputation: 135

Ddply and summary of categorical variables

I have a dataframe x like this

Id   Group   Var1
001    A     yes
002    A     no
003    A     yes
004    B     no
005    B     yes
006    C     no

I want to create a data frame like this

Group    yes    no
A        2      1
B        1      1
C        0      1

The function .aggregate works well

aggregate(x$Var1 ~ x$Group,FUN=summary)

but I am not able to create a dataframe with the results.

If I try using .ddply

ddply(x,"Group",function(x) summary(x$Var1))

I obtain the error: Results do not have equal lengths.

What am I doing wrong?

Thanks.

Upvotes: 1

Views: 4207

Answers (2)

agstudy
agstudy

Reputation: 121608

I introduce an NA in your data

dat <- read.table(text = 'Id   Group   Var1
001    A     yes
002    A     no
003    A     NA     ## here!
004    B     no
005    B     yes
006    C     no',head = T)

You need to remove NA before summary , because summary create a column for NA and aggregate formula method has a default setting of na.action = na.omitwhich would exclude the extra NA' column. Here a workaround, I remove the NA before the summary:

 library(plyr)
  ddply(dat,"Group",function(x) {
    x <- na.omit(x$Var1)
    y <- summary(x)
})
 Group no yes
1     A  1   1
2     B  1   1
3     C  1   0

which is equiavlent to

x <- dat
aggregate(x$Var1 ~ x$Group,FUN=summary)
  x$Group x$Var1.no x$Var1.yes
1       A         1          1
2       B         1          1
3       C         1          0

Upvotes: 3

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193667

This doesn't answer your question about ddply, but it should help you with your aggregate output.The second column in the aggregate command that you used is a matrix, but you can wrap the whole output in a do.call(data.frame... statement to get a data frame instead. Assuming your data.frame is called "mydf":

temp <- do.call(data.frame, aggregate(Var1 ~ Group, mydf, summary))
temp
#   Group Var1.no Var1.yes
# 1     A       1        2
# 2     B       1        1
# 3     C       1        0
str(temp)
# 'data.frame':  3 obs. of  3 variables:
#  $ Group   : Factor w/ 3 levels "A","B","C": 1 2 3
#  $ Var1.no : int  1 1 1
#  $ Var1.yes: int  2 1 0

Alternatively, you might look at table:

table(mydf$Group, mydf$Var1)
#    
#     no yes
#   A  1   2
#   B  1   1
#   C  1   0
as.data.frame.matrix(table(mydf$Group, mydf$Var1))
#   no yes
# A  1   2
# B  1   1
# C  1   0

Upvotes: 4

Related Questions