Reputation: 137
I am working with the scale data from GeomLine objects. In this I am trying to find the data for the disjoint lines.
The data might look something like this
structure(list(x = c(96, 150, 150, 157, 163, 165, 165, 165, 166,
167, 168, 170, 178, 178, 184, 213), y = c(NA, 49, NA, NA,
NA, 75, NA, 45, 50, NA, 55, 56.2, 55, 57, 50, NA)), class = "data.frame",
row.names = c(NA, -16L))
#> x y
#> 1 96 NA
#> 2 150 49.0
#> 3 150 NA
#> 4 157 NA
#> 5 163 NA
#> 6 165 75.0
#> 7 165 NA
#> 8 165 45.0
#> 9 166 50.0
#> 10 167 NA
#> 11 168 55.0
#> 12 170 56.2
#> 13 178 55.0
#> 14 178 57.0
#> 15 184 50.0
#> 16 213 NA
I need to find the groups of this data that have at least two sequential rows without NAs. Then turn that into a list of df with these values.
I have gotten it to look something like this:
dplyr::filter(tidyr::drop_na(dplyr::mutate(structure(list(x = c(96,
150, 150, 157, 163, 165, 165, 165, 166, 167, 168, 170, 178,
178, 184, 213), y = c(NA, 49, NA, NA, NA, 75, NA, 45, 50,
NA, 55, 56.2, 55, 57, 50, NA)), class = "data.frame", row.names = c(NA,
-16L)), row_id = dplyr::row_number()), y), row_id == dplyr::lead(row_id) -
1 | row_id == dplyr::lag(row_id) + 1)
#> x y row_id
#> 1 165 45.0 8
#> 2 166 50.0 9
#> 3 168 55.0 11
#> 4 170 56.2 12
#> 5 178 55.0 13
#> 6 178 57.0 14
#> 7 184 50.0 15
It is the actual transformation that I am struggling a little bit with.
I can't seem to find a way to do it. I have tried doing more mutations to get it to a point where I could use group_split
yet I can't seem to get it to that point.
This is what I would want the end result to look like though.
list(data.frame(x = c(165, 166), y = c(45, 50)), data.frame(x = c(168,
170, 178, 178, 184), y = c(55, 56.2, 55, 57, 50)))
#> [[1]]
#> x y
#> 1 165 45
#> 2 166 50
#>
#> [[2]]
#> x y
#> 1 168 55.0
#> 2 170 56.2
#> 3 178 55.0
#> 4 178 57.0
#> 5 184 50.0
EDIT:
Thanks to help from answers and in particular @SALAR I have come up with this solution using only dplyr and base R
lineData |>
group_by(group = cumsum(is.na(y))) |>
filter(!(is.na(y) & n() > 1)) |>
group_split() |>
Filter(function(x) nrow(x) >= 2, x = _)
Upvotes: 5
Views: 77
Reputation: 2140
You can do it using tidyverse family:
df %>%
# Group the data frame by consecutive NAs in the `y` column
group_by(group = cumsum(is.na(y))) %>%
# Remove groups that have more than one NA in the `y` column
filter(!(is.na(y) & n() > 1)) %>%
# Split the data frame into a list of data frames based on the groups
split(.$group) %>%
# Keep only those data frames with more than two rows
keep(~nrow(.) >= 2)
output:
$`5`
# A tibble: 2 × 3
# Groups: group [1]
x y group
<dbl> <dbl> <int>
1 165 45 5
2 166 50 5
$`6`
# A tibble: 5 × 3
# Groups: group [1]
x y group
<dbl> <dbl> <int>
1 168 55 6
2 170 56.2 6
3 178 55 6
4 178 57 6
5 184 50 6
Upvotes: 2
Reputation: 887691
Using data.table
library(data.table)
d1 <- setDT(df1)[, if(.N > 1 && all(!is.na(y))) .(x, y), .(grp = rleid(is.na(y)))]
split(d1[, -1, with = FALSE], d1$grp)
-output
$`6`
x y
1: 165 45
2: 166 50
$`8`
x y
1: 168 55.0
2: 170 56.2
3: 178 55.0
4: 178 57.0
5: 184 50.0
Upvotes: 1
Reputation: 174393
You could do
library(dplyr)
df %>%
mutate(rle = data.table::rleid(is.na(y))) %>%
group_by(rle) %>%
filter(!is.na(y) & n() > 1) %>%
ungroup() %>%
split(.$rle) %>%
lapply(`[`, 1:2)
#> $`6`
#> # A tibble: 2 x 2
#> x y
#> <dbl> <dbl>
#> 1 165 45
#> 2 166 50
#>
#> $`8`
#> # A tibble: 5 x 2
#> x y
#> <dbl> <dbl>
#> 1 168 55
#> 2 170 56.2
#> 3 178 55
#> 4 178 57
#> 5 184 50
Created on 2023-02-07 with reprex v2.0.2
Upvotes: 1