Reputation: 8474
I am working on a large dataset showing how people travel. I need to calculate the amount of unique days people travel on. The table below shows ID, which is unique to each particular person. Associated with each ID is the dates they have travelled on - for some people this may be one trip per day, for others there may well be multiple trips on each day (e.g. person "1" took two trips on the 4th). What I need R to do is pick out the total number of unique days for all people in the dataset (e.g. person 1 = 2, person 2=3, person 3=1, person 4=2 - therefore the total using the mini-dataset below should be 8.
ID = c(1,1,1,2,2,2,2,3,4,4,4,4)
date = c("4th Nov","4th Nov","5th Nov","5th Nov","6th Nov","7th Nov","7th Nov","8th Nov","6th Nov","6th Nov","7th Nov","7th Nov")
data<-data.frame(ID,date)
Any suggestions on the coding for R would be gratefully received.
Many thanks.
Upvotes: 2
Views: 2044
Reputation: 121057
Also possible with tapply
from base R.
with(data, tapply(date, ID, function(x) length(unique(x))))
As an alternative to length(unique(x))
you can utilise the fact that date
is a factor and count the levels.
with(data, tapply(date, ID, function(x) nlevels(x[, drop = TRUE])))
Bonus thoughts:
To solve your problem of defining a variable called "date", note that you can include vectors in your call to data.frame, like so.
data <- data.frame(
ID = c(1,1,1,2,2,2,2,3,4,4,4,4),
date = c("4th Nov","4th Nov","5th Nov","5th Nov","6th Nov","7th Nov","7th Nov","8th Nov","6th Nov","6th Nov","7th Nov","7th Nov")
)
When you have strings that have a lot of repeated content, it is often better to write them using paste
. Your date string can be created more consisely using
paste(c(4, 4, 5, 5, 6, 7, 7, 8, 6, 6, 7, 7), "th Nov", sep = "")
Finally, if you want to do any kind of analysis with dates, you'll want to store them in one of the many date formats. For this, you're best not bothering with the "th", but keep the dates in a form that's easy for computers to parse, like "dd/mm/yyyy". Then call strptime
.
Upvotes: 5
Reputation: 60924
Again a task for ddply:
ddply(data, .(id), summarise, noDays = length(unique(date)))
ID noDays
1 1 2
2 2 3
3 3 1
4 4 2
Upvotes: 4
Reputation: 179408
You should make friends with the plyr
package. The ddply
function makes this bit of analysis very straight-forward. It takes a data.frame
, splits it according to some criterion (in this case ID), applies a function and combines the pieces intoa a data.frame
:
library(plyr)
ddply(data, .(ID), summarise, days=length(unique(date)))
ID days
1 1 2
2 2 3
3 3 1
4 4 2
Or with base R, use split
and sapply
to get a vector with your desired results:
sapply(with(data, split(date, ID)), function(x)length(unique(x)))
1 2 3 4
2 3 1 2
Upvotes: 5