count the times a year between multiple periods

Question

My data looks more or less like this (with in the column car being car ID's, so each line represents one car, and the period during it was active):

structure(list(car = c(100987, 100995, 104641, 120350, 123444, 128248, 128253, 129537, 129612, 130189), 
  location = structure(c(1L, 1L, 1L, 2L, 3L, 4L, 3L, 3L, 3L, 3L), .Label = c("A", "A", "B", "D", "C", "C"), class = "factor"), 
  start = structure(c(12784, 12784, 365, 15027, 15951, -1096, 7305, 365, 365, -731), class = "Date"), 
  end = structure(c(16070, 16070, 16070, 16070, 16070, 13725, 16070, 16070, 16070, 16070), class = "Date")), 
 .Names = c("car", "location", "start", "end"), row.names = c(NA, 10L), class = "data.frame")

I would like to obtain a dataframe as follows:

output=data.frame(location=rep(c("A","B","C"),each=2),year=rep(2000:2001,3),n_cars=10:15)

So I would like to count, per location, how many cars there are over a range of years (say 2000:2013). A car only counts if it was there on the 31st of december of that year. I thought about making for each entry a list of years during which that car was active and then make the count of list items that contain that year. But my dataframe is too big to make that an efficient approach. How would I do this best? It seems an easy thing to do but I can't figure it out.

maloneypatr · Accepted Answer

Something like this?

library(lubridate)
library(plyr)
library(gdata)

# dat is the name of your original data set structure

yeardf <- data.frame()

for(i in 1:nrow(dat)){
    if(yday(dat$end[i]) > 364)
       years_active <- paste(seq(year(dat$start)[i], year(dat$end)[i]))
    else
       years_active <- paste(seq(year(dat$start)[i], year(dat$end)[i]-1))
    car <- paste(rep(dat$car[i], length(years_active)))
    location <- as.character(rep(dat$location[i], length(years_active)))
    df <- data.frame(car = car, location = location, year = years_active)
    yeardf <- rbind(yeardf, df)
}

output <- ddply(yeardf, .(location, year), 'nrow')
output <- rename.vars(output, 'nrow', 'n_cars')

count the times a year between multiple periods

Answers (1)

Related Questions