Reputation: 716
I want to split my data frame by looping through rows and subsetting
indices = (diff(Data$Time>1800))
for (i in 1:length(indices)){
if(indices[i]==TRUE){
##### I need a function to split data by row index
}}
I tried
lst <- split(Data, as.factor(diff(Data$Time>1800)==TRUE))
But it returns only rows that are true, but what i want is to split each time it faces true.
This is what I have
Time temp
7/1/17 13:45:34 56
7/1/17 13:45:37 68
7/1/17 13:45:39 98
7/1/17 13:45:40 99
7/1/17 13:45:46 97
7/1/17 14:16:29 48
7/1/17 14:16:30 78
7/1/17 14:16:31 66
7/1/17 14:17:34 93
7/1/17 14:17:39 98
7/1/17 14:17:40 98
7/1/17 14:17:44 93
7/1/17 14:47:10 54
7/1/17 14:47:12 67
7/1/17 14:47:16 69
7/1/17 14:47:18 95
7/1/17 14:47:19 95
7/1/17 14:47:23 96
7/1/17 14:47:28 96
7/1/17 14:47:30 99
This is what I want
Time temp
7/1/17 13:45:34 56
7/1/17 13:45:37 68
7/1/17 13:45:39 98
7/1/17 13:45:40 99
7/1/17 13:45:46 97
Time temp
7/1/17 14:16:29 48
7/1/17 14:16:30 78
7/1/17 14:16:31 66
7/1/17 14:17:34 93
7/1/17 14:17:39 98
7/1/17 14:17:40 98
7/1/17 14:17:44 93
Time temp
7/1/17 14:47:10 54
7/1/17 14:47:12 67
7/1/17 14:47:16 69
7/1/17 14:47:18 95
7/1/17 14:47:19 95
7/1/17 14:47:23 96
7/1/17 14:47:28 96
7/1/17 14:47:30 99
Is it possible to split my data by storing these indexes in a vector and then splitting data frame based on this vector, which means whenever the row number is equal to our value " i " split the data frame at that row. So as to have multiple data frames.
Upvotes: 2
Views: 2100
Reputation: 887118
With the new dataset, it seems like instead of 1800, it should be 1700
library(dplyr)
library(purrr)
library(lubridate)
Data %>%
mutate(Time = dmy_hms(Time),
grp = cumsum(Time - lag(Time, default = Time[1L])> 1700)) %>%
split(.$grp) %>%
map(~ .x %>%
select(-grp))
#$`0`
# Time temp
#1 2017-01-07 13:45:34 56
#2 2017-01-07 13:45:37 68
#3 2017-01-07 13:45:39 98
#4 2017-01-07 13:45:40 99
#5 2017-01-07 13:45:46 97
#$`1`
# Time temp
#6 2017-01-07 14:16:29 48
#7 2017-01-07 14:16:30 78
#8 2017-01-07 14:16:31 66
#9 2017-01-07 14:17:34 93
#10 2017-01-07 14:17:39 98
#11 2017-01-07 14:17:40 98
#12 2017-01-07 14:17:44 93
#$`2`
# Time temp
#13 2017-01-07 14:47:10 54
#14 2017-01-07 14:47:12 67
#15 2017-01-07 14:47:16 69
#16 2017-01-07 14:47:18 95
#17 2017-01-07 14:47:19 95
#18 2017-01-07 14:47:23 96
#19 2017-01-07 14:47:28 96
#20 2017-01-07 14:47:30 99
Similar option with base R
would be
split(Data, cumsum(c(0, diff(as.POSIXct(Data$Time,
format = "%d/%m/%y %H:%M:%S", tz = 'GMT'))) > 1700))
Upvotes: 1