KT_1
KT_1

Reputation: 8474

Calculating a total number of unique days of data

I am working on a large dataset showing how people travel. I need to calculate the amount of unique days people travel on. The table below shows ID, which is unique to each particular person. Associated with each ID is the dates they have travelled on - for some people this may be one trip per day, for others there may well be multiple trips on each day (e.g. person "1" took two trips on the 4th). What I need R to do is pick out the total number of unique days for all people in the dataset (e.g. person 1 = 2, person 2=3, person 3=1, person 4=2 - therefore the total using the mini-dataset below should be 8.

ID = c(1,1,1,2,2,2,2,3,4,4,4,4)
date = c("4th Nov","4th Nov","5th Nov","5th Nov","6th Nov","7th Nov","7th Nov","8th Nov","6th Nov","6th Nov","7th Nov","7th Nov")
data<-data.frame(ID,date)

Any suggestions on the coding for R would be gratefully received.

Many thanks.

Upvotes: 2

Views: 2044

Answers (3)

Richie Cotton
Richie Cotton

Reputation: 121057

Also possible with tapply from base R.

with(data, tapply(date, ID, function(x) length(unique(x))))

As an alternative to length(unique(x)) you can utilise the fact that date is a factor and count the levels.

with(data, tapply(date, ID, function(x) nlevels(x[, drop = TRUE])))

Bonus thoughts:

To solve your problem of defining a variable called "date", note that you can include vectors in your call to data.frame, like so.

data <- data.frame(
  ID = c(1,1,1,2,2,2,2,3,4,4,4,4),
  date = c("4th Nov","4th Nov","5th Nov","5th Nov","6th Nov","7th Nov","7th Nov","8th Nov","6th Nov","6th Nov","7th Nov","7th Nov")
)

When you have strings that have a lot of repeated content, it is often better to write them using paste. Your date string can be created more consisely using

paste(c(4, 4, 5, 5, 6, 7, 7, 8, 6, 6, 7, 7), "th Nov", sep = "")

Finally, if you want to do any kind of analysis with dates, you'll want to store them in one of the many date formats. For this, you're best not bothering with the "th", but keep the dates in a form that's easy for computers to parse, like "dd/mm/yyyy". Then call strptime.

Upvotes: 5

Paul Hiemstra
Paul Hiemstra

Reputation: 60924

Again a task for ddply:

ddply(data, .(id), summarise, noDays = length(unique(date)))

  ID noDays
1  1      2
2  2      3
3  3      1
4  4      2

Upvotes: 4

Andrie
Andrie

Reputation: 179408

You should make friends with the plyr package. The ddply function makes this bit of analysis very straight-forward. It takes a data.frame, splits it according to some criterion (in this case ID), applies a function and combines the pieces intoa a data.frame:

library(plyr)
ddply(data, .(ID), summarise, days=length(unique(date)))
  ID days
1  1    2
2  2    3
3  3    1
4  4    2

Or with base R, use split and sapply to get a vector with your desired results:

sapply(with(data, split(date, ID)), function(x)length(unique(x)))
1 2 3 4 
2 3 1 2 

Upvotes: 5

Related Questions