Reputation: 959
I have following dataframe.
As suggested by many user I am sharing this with dput.
> dput(time_interval_df)
structure(list(timeinterval = structure(1:24, .Label = c("00:00:00 00:59:59",
"01:00:00 01:59:59", "02:00:00 02:59:59", "03:00:00 03:59:59",
"04:00:00 04:59:59", "05:00:00 05:59:59", "06:00:00 06:59:59",
"07:00:00 07:59:59", "08:00:00 08:59:59", "09:00:00 09:59:59",
"10:00:00 10:59:59", "11:00:00 11:59:59", "12:00:00 12:59:59",
"13:00:00 13:59:59", "14:00:00 14:59:59", "15:00:00 15:59:59",
"16:00:00 16:59:59", "17:00:00 17:59:59", "18:00:00 18:59:59",
"19:00:00 19:59:59", "20:00:00 20:59:59", "21:00:00 21:59:59",
"22:00:00 22:59:59", "23:00:00 23:59:59"), class = "factor"),
timecount = c(1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1,
0, 0, 0, 0, 1, 0, 1, 1, 1)), .Names = c("timeinterval", "timecount"
), row.names = c(NA, -24L), class = "data.frame")
Structure of my dataframe is
str(time_interval_df)
'data.frame': 24 obs. of 2 variables:
$ timeinterval: Factor w/ 24 levels "00:00:00 00:59:59",..: 1 2 3 4 5 6 7 8 9 10 ...
$ timecount : num 1 1 1 0 0 0 0 0 1 1 ...
If the difference between the timeinterval is 1 and there is some time count then I want to group them in a single group .For example the difference b/w row no 1 and row no 2 is 1 and also the difference b/w row2 and row3 (consecutively) is 1 so I want to group all into one and display the sum of timecount in front of new row i.e. "00:00:00 02:59:59" Below is my desired output
After getting the second output my second desired result will be
Please help me on this. I have spent a lot of time in this.
Upvotes: 0
Views: 378
Reputation: 887118
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1
). Create two columns ('time1', 'time2') by splitting the 'timeinterval (tstrsplit()
) at the whitespace, grouped by the run-length-id of 'timecount', if
all
the 'timecount' values are not equal to 0, then paste
the first (1
) and last (.N
) elements of 'time1' and 'time2' respectively, and also get the number or rows (.N
). If needed, assign the 'grp' column to 'NULL'
library(data.table)
res1 <- setDT(df1)[, c('time1', 'time2') := tstrsplit(timeinterval, " ")
][, if(all(timecount!=0)) .(timeinterval= paste(time1[1], time2[.N]), timecount = .N),
.(grp = rleid(timecount))][, grp := NULL][]
res1
# timeinterval timecount
#1: 00:00:00 02:59:59 3
#2: 08:00:00 09:59:59 2
#3: 14:00:00 14:59:59 1
#4: 19:00:00 19:59:59 1
#5: 21:00:00 23:59:59 3
From 'res1', we can summarise to get the 'res2'
rbind(res1[c(1, .N)][, .(timeinterval = paste(substr(timeinterval[.N],
1, 8), substring(timeinterval[1], 10)), timecount= sum(timecount))], res1[-c(1, .N)])
# timeinterval timecount
#1: 21:00:00 02:59:59 6
#2: 08:00:00 09:59:59 2
#3: 14:00:00 14:59:59 1
#4: 19:00:00 19:59:59 1
Upvotes: 2