Reputation:
I am a newbie to R and I have a data frame which contains the following fields:
day place hour time_spent count
1 1 1 1 120
1 1 1 2 100
1 1 1 3 90
1 1 1 4 80
So my aim is to calculate the time spent in each place where 75% of the vehicles to cross the place.So from this data frame I generate the below data frame by
day place hour time_spent count cum_count percentage
1 1 1 1 120 120 30.7%
1 1 1 2 100 220 56.4%
1 1 1 3 90 310 79%
1 1 1 4 80 390 100%
df$cum_count=cumsum(df$count)
df$percentage=cumsum(df$percentage)
for(i in 1:length(df$percentage)){
if(df$percentage[i]>75%){
low time=df$time_spent[i-1]
high_time=df$time_spent[i]
}
}
So which means that 75% of vehicles are spending 2-3 minutes in the place 1.But now I have a data frame like this which is for all the places and for all the days.
day place hour time_spent count
1 1 1 1 120
1 1 1 2 100
1 1 1 3 90
1 1 1 4 80
1 2 1 1 220
1 2 1 2 100
1 2 1 3 90
1 2 1 4 80
1 3 1 1 100
1 3 1 2 80
1 3 1 3 90
1 3 1 4 100
2 1 1 1 120
2 1 1 2 100
2 1 1 3 90
2 1 1 4 80
2 2 1 1 220
2 2 1 2 100
2 2 1 3 90
2 2 1 4 80
2 3 1 1 100
2 3 1 2 80
2 3 1 3 90
2 3 1 4 100
How is it possible to calculate the high time and low time for each place?Any help is appreciated.
Upvotes: 0
Views: 18968
Reputation: 1598
The max
and min
functions ought to do the trick here. Although you could also do summary
to get median, mean, etc in one go. I'd also recommend the quantile
function for these percentages. As usually the case with R the tricky part if getting the data in the correct format.
Say you want the total time spent at each place:
index <- sort(unique(df$place))
times <- as.list(rep(NA, length(index)))
names(times) <- index
for(ii in index){
counter <- c()
for(jj in df[df$place==ii,]$time_spent){
counter <- c(counter, rep(jj, df[df$place==ii,]$count[jj]))
}
times[[ii]] <- counter
}
Now for each place you can compute the max and min with:
lapply(times, max)
lapply(times, min)
Similarly you can compute the mean:
lapply(times, function(x) sum(x)/length(x))
lapply(times, mean)
I think what you want are the quantiles:
lapply(times, quantile, 0.75)
This would be time by which at least 75% of vehicles had passed though a place, i.e., 75% of vehicles had took this time or less to pass through.
Upvotes: 1
Reputation: 1628
if i understood your question correctly (you want min and max value of time_spent in a place):
df %>%
group_by(place) %>%
summarise(min(time_spent),
max(time_spent))
will give you this:
place min(time-spent) max(time_spent)
1 1 4
2 1 4
3 1 4
Upvotes: 0
Reputation: 886928
We can use a group by operation
library(dplyr)
dfN %>%
group_by(day, place) %>%
mutate(cum_count = cumsum(count),
percentage = 100*cum_count/sum(count),
low_time = time_spent[which.max(percentage > 75)-1],
high_time = time_spent[low_time+1])
Upvotes: 0