Christoffer
Christoffer

Reputation: 661

Subset nested data frame with dplyr

I would like to subset an inner data frame of a nested data frame with dplyr?

I have the following nested data frame:

library(dplyr)

# Initialise nested data frame
d <- tibble(group = c("A", "B"),
            data = rep(list(NA), 2))

set.seed(1)
d$data[[1]] <- data.frame(x = seq(1:10),
                          y = rnorm(10))
d$data[[2]] <- data.frame(x = seq(1:15),
                          y = rnorm(15),
                          z = runif(15))

Let suppose that I only want the rows in the data frame for group == "A" where y >= 0, while the data frame for group == B stays intact. Edit: The two resulting data frames should have the same variables after the operation.

I was thinking to do something like the line below but in combination with the mutate command, but the filter(y >= 0) does not work here. So, how should I do it?

d %>% filter(group == "A") %>% select(data) %>% filter(y >= 0)

Upvotes: 5

Views: 2133

Answers (2)

akrun
akrun

Reputation: 887981

We could do by doing the filtering inside map2

library(tidyverse)
d %>%
    mutate(data = map2(group, data, ~
                                .y %>% 
                                     filter(!(.x == "A" &  y < 0))))
 # A tibble: 2 x 2
 #  group data                 
 #  <chr> <list>               
 #1 A     <data.frame [6 × 2]> 
 #2 B     <data.frame [15 × 3]>

Using the reverse comparison, it would be

out <- d %>%
         mutate(data = map2(group, data, ~
                         .y %>%
                           filter((.x == "A" & y >=0)|.x != "A")))
out
# A tibble: 2 x 2 
#  group data                 
#  <chr> <list>               
#1 A     <data.frame [6 × 2]> 
#2 B     <data.frame [15 × 3]>

map(d$data, dim)
#[[1]]
#[1] 10  2

#[[2]]
#[1] 15  3

map(out$data, dim)
#[[1]]
#[1] 6 2

#[[2]]
#[1] 15  3

Upvotes: 4

AntoniosK
AntoniosK

Reputation: 16121

You want to filter based on two variables (group and y). However, one of the variables (y) is part of a nested variable (data). It's much easier to access those two variables via filter if you first unnest your data. You can then nest your data again if you really need to.

library(tidyverse)

# Initialise nested data frame
d <- tibble(group = c("A", "B"),
            data = rep(list(NA), 2))

set.seed(1)
d$data[[1]] <- data.frame(x = seq(1:10),
                          y = rnorm(10))
d$data[[2]] <- data.frame(x = seq(1:15),
                          y = rnorm(15),
                          z = runif(15))
d %>%
  unnest() %>%                        # unnest data
  filter(!(group == "A" & y < 0)) %>% # exclude rows where y < 0 for group A
  group_by(group) %>%                 # for each group
  nest()                              # nest data

# # A tibble: 2 x 2
#     group data             
#     <chr> <list>           
#   1 A     <tibble [6 x 3]> 
#   2 B     <tibble [15 x 3]>

Upvotes: 2

Related Questions