Reputation: 661
I would like to subset an inner data frame of a nested data frame with dplyr
?
I have the following nested data frame:
library(dplyr)
# Initialise nested data frame
d <- tibble(group = c("A", "B"),
data = rep(list(NA), 2))
set.seed(1)
d$data[[1]] <- data.frame(x = seq(1:10),
y = rnorm(10))
d$data[[2]] <- data.frame(x = seq(1:15),
y = rnorm(15),
z = runif(15))
Let suppose that I only want the rows in the data frame for group == "A"
where y >= 0
, while the data frame for group == B
stays intact. Edit: The two resulting data frames should have the same variables after the operation.
I was thinking to do something like the line below but in combination with the mutate command, but the filter(y >= 0)
does not work here. So, how should I do it?
d %>% filter(group == "A") %>% select(data) %>% filter(y >= 0)
Upvotes: 5
Views: 2133
Reputation: 887981
We could do by doing the filter
ing inside map2
library(tidyverse)
d %>%
mutate(data = map2(group, data, ~
.y %>%
filter(!(.x == "A" & y < 0))))
# A tibble: 2 x 2
# group data
# <chr> <list>
#1 A <data.frame [6 × 2]>
#2 B <data.frame [15 × 3]>
Using the reverse comparison, it would be
out <- d %>%
mutate(data = map2(group, data, ~
.y %>%
filter((.x == "A" & y >=0)|.x != "A")))
out
# A tibble: 2 x 2
# group data
# <chr> <list>
#1 A <data.frame [6 × 2]>
#2 B <data.frame [15 × 3]>
map(d$data, dim)
#[[1]]
#[1] 10 2
#[[2]]
#[1] 15 3
map(out$data, dim)
#[[1]]
#[1] 6 2
#[[2]]
#[1] 15 3
Upvotes: 4
Reputation: 16121
You want to filter
based on two variables (group
and y
). However, one of the variables (y
) is part of a nested variable (data
). It's much easier to access those two variables via filter
if you first unnest
your data. You can then nest
your data again if you really need to.
library(tidyverse)
# Initialise nested data frame
d <- tibble(group = c("A", "B"),
data = rep(list(NA), 2))
set.seed(1)
d$data[[1]] <- data.frame(x = seq(1:10),
y = rnorm(10))
d$data[[2]] <- data.frame(x = seq(1:15),
y = rnorm(15),
z = runif(15))
d %>%
unnest() %>% # unnest data
filter(!(group == "A" & y < 0)) %>% # exclude rows where y < 0 for group A
group_by(group) %>% # for each group
nest() # nest data
# # A tibble: 2 x 2
# group data
# <chr> <list>
# 1 A <tibble [6 x 3]>
# 2 B <tibble [15 x 3]>
Upvotes: 2