WGray
WGray

Reputation: 307

Counting unique items in data frame

I want a simple count of the number of subjects in each condition of a study. The data look something like this:

subjectid  cond   obser variable
1234        1        1      12   
1234        1        2      14
2143        2        1      19
3456        1        1      12 
3456        1        2      14 
3456        1        3      13   

etc       etc    etc       etc

This is a large dataset and it is not always obvious how many unique subjects contribute to each condition, etc.

I have this in a data.frame.

What I want is something like

cond   ofSs 
1       122 
2        98

Where for each "condition" I get a count of the number of unique Ss contributing data to that condition. Seems like this should be painfully simple.

Upvotes: 8

Views: 17681

Answers (4)

malcook
malcook

Reputation: 1723

or, if you like SQL and don't mind installing a package:

library(sqldf);
sqldf("select cond, count(distinct subjectid) from dat")

Upvotes: 4

csgillespie
csgillespie

Reputation: 60462

Just to give you even more choice, you could also use tapply

tapply(a$subjectid, a$cond, function(x) length(unique(x)))
1 2 
2 1 

Upvotes: 3

Gavin Simpson
Gavin Simpson

Reputation: 174813

Using your snippet of data that I loaded into object dat:

> dat
  subjectid cond obser variable
1      1234    1     1       12
2      1234    1     2       14
3      2143    2     1       19
4      3456    1     1       12
5      3456    1     2       14
6      3456    1     3       13

Then one way to do this is to use aggregate to count the unique subjectid (assuming that is what you meant by "Ss"???

> aggregate(subjectid ~ cond, data = dat, FUN = function(x) length(unique(x)))
  cond subjectid
1    1         2
2    2         1

Upvotes: 5

Prasad Chalasani
Prasad Chalasani

Reputation: 20282

Use the ddply function from the plyr package:

require(plyr)
df <- data.frame(subjectid = sample(1:3,7,T), 
                 cond = sample(1:2,7,T), obser = sample(1:7))

> ddply(df, .(cond), summarize, NumSubs = length(unique(subjectid)))
  cond NumSubs
1    1       1
2    2       2

The ddply function "splits" the data-frame by the cond variable, and produces a summary column NumSubs for each sub-data-frame.

Upvotes: 13

Related Questions