Reputation: 23
I have a string variable tours in my dataframe df that represents the different stops an individuum did during a journey.
For example:
1. home_work_leisure_home
2. home_work_shopping_work_home
3. home_work_leisure_errand_home
In Transport planning we group activities in primary (work and education) and secondary activities (everything else). I want to count the number of secondary activities before the first primary activity, inbetween two primary activities after the last primary activity for each tour.
This means I am looking for a function in R that:
a. identifies the first work in the string variable,
b. then counts the number of activities before this first work activity
c. then identifies the last work in the string if there is more than one
d. if there is then count the number of activities between the two work activities,
e. then count the number of activities after the last work activity
The result for the three example tours then would be:
I would be super thankful if someone could give me a hand with this issue - even if it is a link to a similar question.
Tank you. Kind regards Nathalie
Upvotes: 3
Views: 51
Reputation: 290
This should get you started; you can replace "work" and "education" with anything you want:
> x
[1] "home_work_leisure_home" "home_work_shopping_work_home" "home_work_leisure_errand_home"
> strsplit(x,"_")
[[1]]
[1] "home" "work" "leisure" "home"
[[2]]
[1] "home" "work" "shopping" "work" "home"
[[3]]
[1] "home" "work" "leisure" "errand" "home"
ad_last_p<-bet_f_l_p<-be_first_p<-prim_n<-numeric()
for(i in 1:length(x)){
y<-sort(c(which(x[[i]]=="education"),which(x[[i]]=="work"))) ### In each of the examples, find which ones are Primary.
prim_n[i]<-length(y) ### Number of Primary activities
be_first_p[i]<-ifelse(y[1]>1,y[1]-1,0) ### Number before First Primary
bet_f_l_p[i]<-ifelse(length(y)>1,sum(diff(y))-length(y)+1,0) ### Between Primary 1 and 2.
ad_last_p[i]<-length(x[[i]])-y[length(y)] ### Number after last primary
}
> z<-cbind(be_first_p,bet_f_l_p,af_last_p,prim_n)
> z
be_first_p bet_f_l_p af_last_p prim_n
[1,] 1 0 2 1
[2,] 1 1 1 2
[3,] 1 0 3 1
Hopefully you wanted something simple like this? :) Please let me know if you want any clarifications!
######## EDIT 1 ########
I tried it with a list of 10,000 examples and took about 0.5 seconds. Seems okay. O(n) as worst. If the activities do not consist of any work or education, you can add this in the second line of the loop:
if(length(y)==0){next}.
This will ensure the code works when no primary is recorded, and the output for those cases will be "NA". You can get rid of those silly NA results by using:
z<-z%>%na.omit
Upvotes: 1