MAJ
MAJ

Reputation: 497

Split data frame based on if all rows in a group have a certain value

I have a data frame like this:

df <- data.frame(x = c(0,0,1,1,2,2,3,3,4,4,5,5), y = c(0,1,1,1,0,0,0,1,1,1,0,0))

How can I split the data into two data frames, where for each x value, both y values are equal to 1?

df
   x y
1  0 0
2  0 1
3  1 1 # x = 1: all y = 1
4  1 1 #
5  2 0
6  2 0
7  3 0
8  3 1
9  4 1 # x = 4: all y = 1
10 4 1 #
11 5 0
12 5 0

The two resulting data frames would then look like:

df1 <- data.frame(x = c(1,1,4,4), y = c(1,1,1,1))
df1
  x y
1 1 1
2 1 1
3 4 1
4 4 1

df2 <- data.frame(x = c(0,0,2,2,3,3,5,5), y = c(0,1,0,0,0,1,0,0))
df2
  x y
1 0 0
2 0 1
3 2 0
4 2 0
5 3 0
6 3 1
7 5 0
8 5 0

Upvotes: 5

Views: 331

Answers (5)

ThomasIsCoding
ThomasIsCoding

Reputation: 102700

One base R split variant.


If your y column consists of 0 and 1 only, then you can run the code below (thanks @Henrik)

> split(df, ~ave(y, x) == 1)
$`FALSE`
   x y
1  0 0
2  0 1
5  2 0
6  2 0
7  3 0
8  3 1
11 5 0
12 5 0

$`TRUE`
   x y
3  1 1
4  1 1
9  4 1
10 4 1

otherwise, in general cases of y, we can try

> split(df, ~ ave(y == 1, x) == 1)
$`FALSE`
   x y
1  0 0
2  0 1
5  2 0
6  2 0
7  3 0
8  3 1
11 5 0
12 5 0

$`TRUE`
   x y
3  1 1
4  1 1
9  4 1
10 4 1

Upvotes: 3

AndrewGB
AndrewGB

Reputation: 16876

Here is a data.table option:

library(data.table)
library(dplyr)

setDT(df)[, group := all(y == 1), by = x] %>%
  split(., by = "group", keep.by = FALSE)

Output

$`FALSE`
   x y
1: 0 0
2: 0 1
3: 2 0
4: 2 0
5: 3 0
6: 3 1
7: 5 0
8: 5 0

$`TRUE`
   x y
1: 1 1
2: 1 1
3: 4 1
4: 4 1

Upvotes: 1

PaulS
PaulS

Reputation: 25528

Another possible solution, based on dplyr:

library(dplyr)

df<-data.frame(x=c(0,0,1,1,2,2,3,3,4,4,5,5),y=c(0,1,1,1,0,0,0,1,1,1,0,0))

df1 <- df %>% 
  group_by(x) %>% 
  filter(all(y == 1)) %>% 
  ungroup

df2 <- df %>% 
  anti_join(df1, by = c("x", "y")) 
  
list(df1, df2 %>% as_tibble)

#> [[1]]
#> # A tibble: 4 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1     1
#> 2     1     1
#> 3     4     1
#> 4     4     1
#> 
#> [[2]]
#> # A tibble: 8 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     0     0
#> 2     0     1
#> 3     2     0
#> 4     2     0
#> 5     3     0
#> 6     3     1
#> 7     5     0
#> 8     5     0

Upvotes: 2

Onyambu
Onyambu

Reputation: 79338

in base R:

split(df, ~ave(y == 1, x, FUN = all))
$`FALSE`
   x y
1  0 0
2  0 1
5  2 0
6  2 0
7  3 0
8  3 1
11 5 0
12 5 0

$`TRUE`
   x y
3  1 1
4  1 1
9  4 1
10 4 1

In tidyverse:

library(tidyverse)
df %>%
  group_by(x) %>%
  mutate(s = all(y==1))%>%
  ungroup() %>%
  group_split(s, .keep = FALSE)

[[1]]
# A tibble: 8 x 2
      x     y
  <dbl> <dbl>
1     0     0
2     0     1
3     2     0
4     2     0
5     3     0
6     3     1
7     5     0
8     5     0

[[2]]
# A tibble: 4 x 2
      x     y
  <dbl> <dbl>
1     1     1
2     1     1
3     4     1
4     4     1

Upvotes: 7

dash2
dash2

Reputation: 2262

Thus:

library(dplyr)
df <- df |> group_by(x) |> mutate(all_y_1 = all(y==1)) 

df1 <- df |> filter(all_y_1) |> select(-all_y_1)
df2 <- df |> filter(! all_y_1) |> select(-all_y_1)

Upvotes: 1

Related Questions