Rain Man
Rain Man

Reputation: 1273

Error in frequency table in R

I have a dataframe which is as follow:

   Name   Condition  NumMessage
Table 1        NULL          80 
Table 1        Fair          20
Table 1        Good          60
Table 1       Ideal          50
Table 1       Great          80
Table 2        NULL          80 
Table 2        Fair         100
Table 2        Good          90
Table 2       Ideal          50
Table 2       Great          40

and so on. I tried to create a frequency table for the number of message for each table.

data = as.data.frame(prop.table(table(dataframe$Name)))
colnames(data) = c('Table Name', 'Frequency')
data

but this returns same frequency for all tables. For example, Table 1 contains total of 290 messages where Table 2 contains 360 messages. But the above code gives same frequency for both tables.

Also when I tried to get frequency of each condition for each table, I also got same numbers across tables.

prop.table(table(dataframe$Condition, dataframe$Name))

NULL   | some value
Fair   | some value  
Good   | some value
Ideal  | some value
Great  | some value

is this the correct way to get the frequency of total number of messages for each table and frequency of conditions for each table?

Upvotes: 0

Views: 607

Answers (4)

CuriousBeing
CuriousBeing

Reputation: 1632

You can always tackle this with the package sqldf.

library(sqldf)
Name<-c('Table1','Table1','Table1','Table1','Table1','Table2','Table2','Table2','Table2','Table2')
Cond<-c(NA,'Fair','Good','Ideal','Great',NA,'Fair','Good','Ideal','Great')
Msg<-c(80,20,60,50,80,80,100,90,50,40)
df<-data.frame(Name,Cond,Msg)

Your dataframe:

Name  Cond Msg
1  Table1  <NA>  80
2  Table1  Fair  20
3  Table1  Good  60
4  Table1 Ideal  50
5  Table1 Great  80
6  Table2  <NA>  80
7  Table2  Fair 100
8  Table2  Good  90
9  Table2 Ideal  50
10 Table2 Great  40

Now simply use this statement for sum of messages for each table:

sqldf("select Name, sum(Msg) from df group by Name ") 

Name sum(Msg)
1 Table1      290
2 Table2      360

If you want sum of messages for each condition then use:

sqldf("select Cond, sum(Msg) from df group by Cond ")
Cond sum(Msg)
1  <NA>      160
2  Fair      120
3  Good      150
4 Great      120
5 Ideal      100

Hope that helps.

Upvotes: 0

akrun
akrun

Reputation: 887223

We could try with acast

library(reshape2)
prop.table(acast(df1, Name~Condition, value.var='NumMessage', sum),1)
#                 Fair      Good     Great     Ideal      NULL
#Table 1 0.06896552 0.2068966 0.2758621 0.1724138 0.2758621
#Table 2 0.27777778 0.2500000 0.1111111 0.1388889 0.2222222

Upvotes: 3

thelatemail
thelatemail

Reputation: 93843

xtabs is the base R way to get a summed contingency table.

prop.table(xtabs(NumMessage ~ ., data=df), 1)
#        Condition
#Name           Fair       Good      Great      Ideal       NULL
#  Table1 0.06896552 0.20689655 0.27586207 0.17241379 0.27586207
#  Table2 0.27777778 0.25000000 0.11111111 0.13888889 0.22222222

Upvotes: 3

Charles Stangor
Charles Stangor

Reputation: 304

If we call your dataset df, then perhaps this is what you are looking for?

df1 = subset(df, Name=='Table1')
df2 = subset(df, Name=='Table2')
prop.table(df1[,3])
prop.table(df2[,3])

aggregate(df1$NumMessage, list(df1$Name), sum)
aggregate(df1$NumMessage, list(df2$Name), sum)

Upvotes: 0

Related Questions