Reputation: 3
This might be very trivial but I could not find an easy solution anywhere. I am trying to create a script in R to count entries in one column that belong to a one of 3 categories specified another column. I have a list of clinical patients with ID numbers (more than one entry for the same ID) that have been seen by two services (a or b). I need to know how many ID have been seen by service a and by service b and service c, but counting repeated visits by a service only once (so basically the number of patients that have used each service at least once) - hope this makes sense, here is an example to explain.
Example:
ID Category A001 a A002 a A002 a A002 b A003 b A003 b A005 c A001 a A004 b A004 b A006 c A006 a
Output should be something like:
a=3 b=3 c=2
This is what I have done, but I am quite stuck, and this might not be good at all!
DataString<- matrix(nrow=dim(refnum)[1], ncol=1)
for (i in 1:dim(refnum)[1]){
DataString[i,1]<- paste(refnum[i,], collapse = '')
}
#generate vector of unique strings
uniqueID<- unique(DataString)
#create new matrix to store new IDs
newID<- matrix(nrow=dim(data)[1], ncol=1)
#initiate index n
n<-0
#loop through unique strings
for (i in 1:dim(refnum)[1]){
#increment n by 1 for each increment through unique strings
n<- n+1
#loop through data rows
for (j in 1:dim(data)[1]){
#find matches with string i
index<- which(DataString == uniqueID[i,1])
#assign new ID to matching rows
newID[index,1]<- n
}
}
Upvotes: 0
Views: 323
Reputation: 193517
If my interpretation of the question so far is correct, you might be able to use the following:
table(unique(mydf)$Category)
#
# a b c
# 3 3 2
I'm a little bit cautious, though, because of your sentence "so basically the number of patients that have used each service at least once", which sounds like you want patients who have availed all three services, in which case the answer is none!
As such, aggregate
might also be of interest, to at least see more easily what you're dealing with:
temp <- aggregate(Category ~ ID, mydf, function(x) sort(unique(x)))
temp
# ID Category
# 1 A001 a
# 2 A002 a, b
# 3 A003 b
# 4 A004 b
# 5 A005 c
# 6 A006 a, c
A possible advantage here is that the earlier tabulation (if it is what you need) is also possible from the output of aggregate
by using table(unlist(temp$Category))
, so you get to see both utilization of services by ID and whatever summary you need.
Upvotes: 3
Reputation: 118799
One of the many solutions:
table(df[!duplicated(df), "Category"])
# a b c
# 3 3 2
Upvotes: 4