Subset whole nested dataframe structure R

Question

I have a dataset with nested structures in R (some cells are arrays from its original JSON structure).

set.seed(123)
data = list()
data$nested_df_1 = data.frame(a = letters[1:10]
                              , b = round(rnorm(10), 0))
data$nested_df_2 = list()
data$nested_df_2$nested_df_2_1 = data.frame(c = letters[11:20]
                                            , d = sample(-100:100, 10))

Now I want to subset the whole list data so that it only includes all instances (= all rows in all structures) where data$nested_df_1$b >= 0.

> data$nested_df_1
   a  b
1  a -1
2  b  0
3  c  2
4  d  0
5  e  0
6  f  2
7  g  0
8  h -1
9  i -1
10 j  0

Thus: rows 1, 8, 9 would need to be removed from the whole structure (i.e. from data$nested_df_1 and data$nested_df_2$nested_df_2_1.

If I just wanted this for the data$nested_df_1 dataframe, I could do:

data$nested_df_1 = data$nested_df_1[data$nested_df_1$b >= 0, ]

(The indices remain constant, i.e. if row_i in data$nested_df_1 meets the criterion, then this is also true for row_i in data$nested_df_2$nested_df_2_1).

But how can I do the subset for the whole nested structure?

akrun · Accepted Answer

We can create a logical index, loop through the list, if it is a data.frame subset or else loop through the list and subset (assuming list nest is of depth 2)

i1 <- data$nested_df_1$b >= 0
lapply(data, function(x) if(is.data.frame(x)) subset(x, i1) else
        lapply(x, function(y) subset(y, i1)))

Subset whole nested dataframe structure R

Answers (1)

Related Questions