Marco
Marco

Reputation: 3

Count once entries of 3 categories specified in another column in R

This might be very trivial but I could not find an easy solution anywhere. I am trying to create a script in R to count entries in one column that belong to a one of 3 categories specified another column. I have a list of clinical patients with ID numbers (more than one entry for the same ID) that have been seen by two services (a or b). I need to know how many ID have been seen by service a and by service b and service c, but counting repeated visits by a service only once (so basically the number of patients that have used each service at least once) - hope this makes sense, here is an example to explain.

Example:

     ID    Category
     A001  a
     A002  a 
     A002  a
     A002  b
     A003  b
     A003  b
     A005  c
     A001  a
     A004  b
     A004  b
     A006  c
     A006  a

Output should be something like:

     a=3
     b=3
     c=2

This is what I have done, but I am quite stuck, and this might not be good at all!

 DataString<- matrix(nrow=dim(refnum)[1], ncol=1)
 for (i in 1:dim(refnum)[1]){
   DataString[i,1]<- paste(refnum[i,], collapse = '')
 }

 #generate vector of unique strings
 uniqueID<- unique(DataString)

 #create new matrix to store new IDs
 newID<- matrix(nrow=dim(data)[1], ncol=1)

 #initiate index n
 n<-0
 #loop through unique strings
 for (i in 1:dim(refnum)[1]){
   #increment n by 1 for each increment through unique strings
   n<- n+1
   #loop through data rows
   for (j in 1:dim(data)[1]){    
     #find matches with string i
     index<- which(DataString == uniqueID[i,1])
     #assign new ID to matching rows
     newID[index,1]<- n
   }
 }

Upvotes: 0

Views: 323

Answers (2)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

If my interpretation of the question so far is correct, you might be able to use the following:

table(unique(mydf)$Category)
# 
# a b c 
# 3 3 2 

I'm a little bit cautious, though, because of your sentence "so basically the number of patients that have used each service at least once", which sounds like you want patients who have availed all three services, in which case the answer is none!

As such, aggregate might also be of interest, to at least see more easily what you're dealing with:

temp <- aggregate(Category ~ ID, mydf, function(x) sort(unique(x)))
temp
#     ID Category
# 1 A001        a
# 2 A002     a, b
# 3 A003        b
# 4 A004        b
# 5 A005        c
# 6 A006     a, c

A possible advantage here is that the earlier tabulation (if it is what you need) is also possible from the output of aggregate by using table(unlist(temp$Category)), so you get to see both utilization of services by ID and whatever summary you need.

Upvotes: 3

Arun
Arun

Reputation: 118799

One of the many solutions:

table(df[!duplicated(df), "Category"])

# a b c 
# 3 3 2 

Upvotes: 4

Related Questions