Reputation: 73
I have a dataframe with a list of consecutive measurements of temperature over a period of time. I would like to split these up into separate dataframes based on when the temperature is consecutively above zero. For example:
Date | Time | Temp |
---|---|---|
2022-05-20 | 4:16 | 32 |
2022-05-20 | 4:17 | 16 |
2022-05-20 | 4:18 | 0 |
2022-05-20 | 4:19 | 7 |
2022-05-20 | 4:34 | 75 |
2022-05-20 | 4:39 | 5 |
2022-05-20 | 4:48 | 0 |
2022-05-20 | 4:49 | 0 |
2022-05-20 | 4:59 | 83 |
The desired output would be:
Date | Time | Temp |
---|---|---|
2022-05-20 | 4:16 | 32 |
2022-05-20 | 4:17 | 16 |
and
Date | Time | Temp |
---|---|---|
2022-05-20 | 4:19 | 7 |
2022-05-20 | 4:34 | 75 |
2022-05-20 | 4:39 | 5 |
and
Date | Time | Temp |
---|---|---|
2022-05-20 | 4:59 | 83 |
Upvotes: 4
Views: 71
Reputation: 17646
A base R approach would be to use split
with cumsum
:
split(df[df$Temp != 0, ],
cumsum(df$Temp == 0)[df$Temp != 0])
Or a much more readable format suggested by @thelatemail
sel <- df$Temp == 0
split(df[!sel,], cumsum(sel)[!sel])
Output for both:
# $`0`
# Date Time Temp
# 1 2022-05-20 4:16 32
# 2 2022-05-20 4:17 16
#
# $`1`
# Date Time Temp
# 4 2022-05-20 4:19 7
# 5 2022-05-20 4:34 75
# 6 2022-05-20 4:39 5
#
# $`3`
# Date Time Temp
# 9 2022-05-20 4:59 83
(Using these data)
df <- read.table(text = "Date Time Temp
2022-05-20 4:16 32
2022-05-20 4:17 16
2022-05-20 4:18 0
2022-05-20 4:19 7
2022-05-20 4:34 75
2022-05-20 4:39 5
2022-05-20 4:48 0
2022-05-20 4:49 0
2022-05-20 4:59 83", header = TRUE)
Upvotes: 3
Reputation: 4122
You can do this in dplyr
where a group
column remains
x = data.frame(num = 1:20, t = rep(c(23,16,5,0),5))
library(dplyr)
x <- x %>%
mutate(group = cumsum(lag(t, default = 0) == 0 & t > 0))%>%
filter(t > 0)
split_list <- split(x, x$group)
or in base R
sl <- split(x[x$t > 0, ] , cumsum(c(1, diff(x$t == 0) > 0))[x$t > 0])
giving
$`1`
num t
1 1 23
2 2 16
3 3 5
$`2`
num t
5 5 23
6 6 16
7 7 5
$`3`
num t
9 9 23
10 10 16
11 11 5
$`4`
num t
13 13 23
14 14 16
15 15 5
$`5`
num t
17 17 23
18 18 16
19 19 5
Upvotes: 2
Reputation: 102519
rle
split(
subset(
transform(
df,
grp = with(
rle(Temp > 0),
rep(cumsum(values), lengths)
)
), Temp > 0
), ~grp
)
which gives
$`1`
Date Time Temp grp
1 2022-05-20 4:16 32 1
2 2022-05-20 4:17 16 1
$`2`
Date Time Temp grp
4 2022-05-20 4:19 7 2
5 2022-05-20 4:34 75 2
6 2022-05-20 4:39 5 2
$`3`
Date Time Temp grp
9 2022-05-20 4:59 83 3
dplyr
, you can try consecutive_id
library(dplyr)
df %>%
mutate(grp = consecutive_id(Temp > 0)) %>%
filter(Temp > 0) %>%
group_split(grp, .keep = FALSE)
which gives
[[1]]
# A tibble: 2 × 3
Date Time Temp
<chr> <chr> <int>
1 2022-05-20 4:16 32
2 2022-05-20 4:17 16
[[2]]
# A tibble: 3 × 3
Date Time Temp
<chr> <chr> <int>
1 2022-05-20 4:19 7
2 2022-05-20 4:34 75
3 2022-05-20 4:39 5
[[3]]
# A tibble: 1 × 3
Date Time Temp
<chr> <chr> <int>
1 2022-05-20 4:59 83
> dput(df)
structure(list(Date = c("2022-05-20", "2022-05-20", "2022-05-20",
"2022-05-20", "2022-05-20", "2022-05-20", "2022-05-20", "2022-05-20",
"2022-05-20"), Time = c("4:16", "4:17", "4:18", "4:19", "4:34",
"4:39", "4:48", "4:49", "4:59"), Temp = c(32L, 16L, 0L, 7L, 75L,
5L, 0L, 0L, 83L)), class = "data.frame", row.names = c(NA, -9L
))
Upvotes: 2