Nadka
Nadka

Reputation: 73

Split up dataframe into list of dataframes based on breaks in one column

I have a dataframe with a list of consecutive measurements of temperature over a period of time. I would like to split these up into separate dataframes based on when the temperature is consecutively above zero. For example:

Date Time Temp
2022-05-20 4:16 32
2022-05-20 4:17 16
2022-05-20 4:18 0
2022-05-20 4:19 7
2022-05-20 4:34 75
2022-05-20 4:39 5
2022-05-20 4:48 0
2022-05-20 4:49 0
2022-05-20 4:59 83

The desired output would be:

Date Time Temp
2022-05-20 4:16 32
2022-05-20 4:17 16

and

Date Time Temp
2022-05-20 4:19 7
2022-05-20 4:34 75
2022-05-20 4:39 5

and

Date Time Temp
2022-05-20 4:59 83

Upvotes: 4

Views: 71

Answers (3)

jpsmith
jpsmith

Reputation: 17646

A base R approach would be to use split with cumsum:

split(df[df$Temp != 0, ], 
      cumsum(df$Temp == 0)[df$Temp != 0])

Or a much more readable format suggested by @thelatemail

sel <- df$Temp == 0

split(df[!sel,], cumsum(sel)[!sel])

Output for both:

# $`0`
#   Date Time Temp
# 1 2022-05-20 4:16   32
# 2 2022-05-20 4:17   16
# 
# $`1`
#   Date Time Temp
# 4 2022-05-20 4:19    7
# 5 2022-05-20 4:34   75
# 6 2022-05-20 4:39    5
# 
# $`3`
#   Date Time Temp
# 9 2022-05-20 4:59   83

(Using these data)

df <- read.table(text = "Date   Time    Temp
2022-05-20  4:16    32
2022-05-20  4:17    16
2022-05-20  4:18    0
2022-05-20  4:19    7
2022-05-20  4:34    75
2022-05-20  4:39    5
2022-05-20  4:48    0
2022-05-20  4:49    0
2022-05-20  4:59    83", header = TRUE)

Upvotes: 3

Tim G
Tim G

Reputation: 4122

You can do this in dplyr where a group column remains

x = data.frame(num = 1:20, t = rep(c(23,16,5,0),5))

library(dplyr)    
x <- x %>%
  mutate(group = cumsum(lag(t, default = 0) == 0 & t > 0))%>% 
  filter(t > 0)    
split_list <- split(x, x$group)

or in base R

sl <- split(x[x$t > 0, ] , cumsum(c(1, diff(x$t == 0) > 0))[x$t > 0])

giving

$`1`
  num  t
1   1 23
2   2 16
3   3  5

$`2`
  num  t
5   5 23
6   6 16
7   7  5

$`3`
   num  t
9    9 23
10  10 16
11  11  5

$`4`
   num  t
13  13 23
14  14 16
15  15  5

$`5`
   num  t
17  17 23
18  18 16
19  19  5

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 102519

  • With base R, you can try rle
split(
    subset(
        transform(
            df,
            grp = with(
                rle(Temp > 0),
                rep(cumsum(values), lengths)
            )
        ), Temp > 0
    ), ~grp
)

which gives

$`1`
        Date Time Temp grp
1 2022-05-20 4:16   32   1
2 2022-05-20 4:17   16   1

$`2`
        Date Time Temp grp
4 2022-05-20 4:19    7   2
5 2022-05-20 4:34   75   2
6 2022-05-20 4:39    5   2

$`3`
        Date Time Temp grp
9 2022-05-20 4:59   83   3
  • With dplyr, you can try consecutive_id
library(dplyr)
df %>%
    mutate(grp = consecutive_id(Temp > 0)) %>%
    filter(Temp > 0) %>%
    group_split(grp, .keep = FALSE)

which gives

[[1]]
# A tibble: 2 × 3
  Date       Time   Temp
  <chr>      <chr> <int>
1 2022-05-20 4:16     32
2 2022-05-20 4:17     16

[[2]]
# A tibble: 3 × 3
  Date       Time   Temp
  <chr>      <chr> <int>
1 2022-05-20 4:19      7
2 2022-05-20 4:34     75
3 2022-05-20 4:39      5

[[3]]
# A tibble: 1 × 3
  Date       Time   Temp
  <chr>      <chr> <int>
1 2022-05-20 4:59     83

data

> dput(df)
structure(list(Date = c("2022-05-20", "2022-05-20", "2022-05-20",
"2022-05-20", "2022-05-20", "2022-05-20", "2022-05-20", "2022-05-20",
"2022-05-20"), Time = c("4:16", "4:17", "4:18", "4:19", "4:34",
"4:39", "4:48", "4:49", "4:59"), Temp = c(32L, 16L, 0L, 7L, 75L,
5L, 0L, 0L, 83L)), class = "data.frame", row.names = c(NA, -9L
))

Upvotes: 2

Related Questions