Reputation: 307
I want a simple count of the number of subjects in each condition of a study. The data look something like this:
subjectid cond obser variable
1234 1 1 12
1234 1 2 14
2143 2 1 19
3456 1 1 12
3456 1 2 14
3456 1 3 13
etc etc etc etc
This is a large dataset and it is not always obvious how many unique subjects contribute to each condition, etc.
I have this in a data.frame.
What I want is something like
cond ofSs
1 122
2 98
Where for each "condition" I get a count of the number of unique Ss contributing data to that condition. Seems like this should be painfully simple.
Upvotes: 8
Views: 17681
Reputation: 1723
or, if you like SQL and don't mind installing a package:
library(sqldf);
sqldf("select cond, count(distinct subjectid) from dat")
Upvotes: 4
Reputation: 60462
Just to give you even more choice, you could also use tapply
tapply(a$subjectid, a$cond, function(x) length(unique(x)))
1 2
2 1
Upvotes: 3
Reputation: 174813
Using your snippet of data that I loaded into object dat
:
> dat
subjectid cond obser variable
1 1234 1 1 12
2 1234 1 2 14
3 2143 2 1 19
4 3456 1 1 12
5 3456 1 2 14
6 3456 1 3 13
Then one way to do this is to use aggregate to count the unique subjectid
(assuming that is what you meant by "Ss"???
> aggregate(subjectid ~ cond, data = dat, FUN = function(x) length(unique(x)))
cond subjectid
1 1 2
2 2 1
Upvotes: 5
Reputation: 20282
Use the ddply
function from the plyr
package:
require(plyr)
df <- data.frame(subjectid = sample(1:3,7,T),
cond = sample(1:2,7,T), obser = sample(1:7))
> ddply(df, .(cond), summarize, NumSubs = length(unique(subjectid)))
cond NumSubs
1 1 1
2 2 2
The ddply
function "splits" the data-frame by the cond
variable, and produces a summary column NumSubs
for each sub-data-frame.
Upvotes: 13