Reputation: 423

How to subset R data frame based on duplicates in one column and unique values in another

This seems pretty straightforward but I am stumped. I have a data frame that looks like this:

df1 %>% head()
  values  paired
  <ch>     <int>
1 apples      1         
2 x           1          
3 oranges     2          
4 z           2          
5 bananas     3          
6 y           3 
7 apples      4
8 p           4

I would like to create a new data frame that extracts all paired values based on a search criteria. So if I want all pairs that correspond to apples I would like to end up with something like this:

df1 %>% head()
  values  paired
  <ch>     <int>
1 apples      1         
2 x           1                   
3 apples      4
4 p           4

I have tried using:

new_pairs <- df1 %>%
  arrange(values, paired) %>%
  filter(duplicated(paired) == TRUE,
         values=="apples")

But I am getting only the apple rows back

Upvotes: 0

Answers (3)

Martin Morgan

Reputation: 46886

In base R, find the values of the pairs of interest

pairs = subset(df1, values %in% "apples")$paired

and create a subset of the data

subset(df1, paired %in% pairs)

Upvotes: 0

AndrewGB

Reputation: 16876

Here is another tidyverse possibility. I filter for the rows that have apples and also keep the rows that immediately follow apples.

library(tidyverse)

df %>%
  filter((values == "apples" |
            row_number() %in% c(which(
              values == "apples", arr.ind = TRUE
            ) + 1)))

Output

  values paired
1 apples      1
2      x      1
3 apples      4
4      p      4

Here is a data.table option (subset is only used to change the order of the columns):

library(data.table)

dt <- as.data.table(df)
subset(dt[, .SD[any(values == "apples")], by = paired], select = c("values", "paired"))

   values paired
1: apples      1
2:      x      1
3: apples      4
4:      p      4

Data

df <-
    structure(list(
      values = c("apples", "x", "oranges", "z", "bananas",
                 "y", "apples", "p"),
      paired = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L)
    ),
    class = "data.frame",
    row.names = c(NA,-8L))

Upvotes: 1

neilfws

Reputation: 33812

You'll need to group on the paired variable before filtering.

How about:

df1 %>% 
  group_by(paired) %>% 
  filter("apples" %in% values) %>%
  ungroup()

Result:

# A tibble: 4 x 2
  values paired
  <chr>   <int>
1 apples      1
2 x           1
3 apples      4
4 p           4

Your data:

df1 <- structure(list(values = c("apples", "x", "oranges", "z", "bananas", "y", "apples", "p"), 
                      paired = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L)), 
                 class = "data.frame", 
                 row.names = c("1", "2", "3", "4", "5", "6", "7", "8"))

Upvotes: 3

How to subset R data frame based on duplicates in one column and unique values in another

Answers (3)

Related Questions