Reputation: 73

Split up dataframe into list of dataframes based on breaks in one column

I have a dataframe with a list of consecutive measurements of temperature over a period of time. I would like to split these up into separate dataframes based on when the temperature is consecutively above zero. For example:

Date	Time	Temp
2022-05-20	4:16	32
2022-05-20	4:17	16
2022-05-20	4:18	0
2022-05-20	4:19	7
2022-05-20	4:34	75
2022-05-20	4:39	5
2022-05-20	4:48	0
2022-05-20	4:49	0
2022-05-20	4:59	83

The desired output would be:

Date	Time	Temp
2022-05-20	4:16	32
2022-05-20	4:17	16

and

Date	Time	Temp
2022-05-20	4:19	7
2022-05-20	4:34	75
2022-05-20	4:39	5

and

Date	Time	Temp
2022-05-20	4:59	83

Upvotes: 4

Answers (3)

jpsmith

Reputation: 17646

A base R approach would be to use split with cumsum:

split(df[df$Temp != 0, ], 
      cumsum(df$Temp == 0)[df$Temp != 0])

Or a much more readable format suggested by @thelatemail

sel <- df$Temp == 0

split(df[!sel,], cumsum(sel)[!sel])

Output for both:

# $`0`
#   Date Time Temp
# 1 2022-05-20 4:16   32
# 2 2022-05-20 4:17   16
# 
# $`1`
#   Date Time Temp
# 4 2022-05-20 4:19    7
# 5 2022-05-20 4:34   75
# 6 2022-05-20 4:39    5
# 
# $`3`
#   Date Time Temp
# 9 2022-05-20 4:59   83

(Using these data)

df <- read.table(text = "Date   Time    Temp
2022-05-20  4:16    32
2022-05-20  4:17    16
2022-05-20  4:18    0
2022-05-20  4:19    7
2022-05-20  4:34    75
2022-05-20  4:39    5
2022-05-20  4:48    0
2022-05-20  4:49    0
2022-05-20  4:59    83", header = TRUE)

Upvotes: 3

Tim G

Reputation: 4122

You can do this in dplyr where a group column remains

x = data.frame(num = 1:20, t = rep(c(23,16,5,0),5))

library(dplyr)    
x <- x %>%
  mutate(group = cumsum(lag(t, default = 0) == 0 & t > 0))%>% 
  filter(t > 0)    
split_list <- split(x, x$group)

or in base R

sl <- split(x[x$t > 0, ] , cumsum(c(1, diff(x$t == 0) > 0))[x$t > 0])

giving

Upvotes: 2

ThomasIsCoding

Reputation: 102519

With base R, you can try rle

split(
    subset(
        transform(
            df,
            grp = with(
                rle(Temp > 0),
                rep(cumsum(values), lengths)
            )
        ), Temp > 0
    ), ~grp
)

which gives

$`1`
        Date Time Temp grp
1 2022-05-20 4:16   32   1
2 2022-05-20 4:17   16   1

$`2`
        Date Time Temp grp
4 2022-05-20 4:19    7   2
5 2022-05-20 4:34   75   2
6 2022-05-20 4:39    5   2

$`3`
        Date Time Temp grp
9 2022-05-20 4:59   83   3

With dplyr, you can try consecutive_id

library(dplyr)
df %>%
    mutate(grp = consecutive_id(Temp > 0)) %>%
    filter(Temp > 0) %>%
    group_split(grp, .keep = FALSE)

which gives

[[1]]
# A tibble: 2 × 3
  Date       Time   Temp
  <chr>      <chr> <int>
1 2022-05-20 4:16     32
2 2022-05-20 4:17     16

[[2]]
# A tibble: 3 × 3
  Date       Time   Temp
  <chr>      <chr> <int>
1 2022-05-20 4:19      7
2 2022-05-20 4:34     75
3 2022-05-20 4:39      5

[[3]]
# A tibble: 1 × 3
  Date       Time   Temp
  <chr>      <chr> <int>
1 2022-05-20 4:59     83

data

> dput(df)
structure(list(Date = c("2022-05-20", "2022-05-20", "2022-05-20",
"2022-05-20", "2022-05-20", "2022-05-20", "2022-05-20", "2022-05-20",
"2022-05-20"), Time = c("4:16", "4:17", "4:18", "4:19", "4:34",
"4:39", "4:48", "4:49", "4:59"), Temp = c(32L, 16L, 0L, 7L, 75L,
5L, 0L, 0L, 83L)), class = "data.frame", row.names = c(NA, -9L
))

Upvotes: 2

Split up dataframe into list of dataframes based on breaks in one column

Answers (3)

data

Related Questions