Dec
Dec

Reputation: 57

How do I subset a dataframe using values from another dataframe?

Let's say I have one dataframe that represents a subset brief. Let's call this dataframe brief. Here, the values are the subset rules.

library(tidyverse)

brief <- data.frame(apple = 1, orange = 2, pear = 3)

In a second dataframe, called types, this is the data I'd like to subset.

types <- data.frame(apple_cake = rnorm(5,0,1),
                    apple_pie = rnorm(5,0, 1),
                    apple_ice = rnorm(5,0, 1),
                    orange_cake = rnorm(5,0,1),
                    orange_pie = rnorm(5, 0, 1),
                    orange_ice = rnorm(5,0, 1),
                    pear_cake = rnorm(5,0,1),
                    pear_pie = rnorm(5, 0, 1),
                    pear_ice = rnorm(5,0, 1)
)

In my example, I'd like to subset specific apple, orange, and pear food items using dataframe one against dataframe two. I'd like to use the values from the first dataframe brief to subset the columns from the second dataframe types.

I would like to end up with a final dataframe as follows:

final <- types %>% 
  select(apple_cake, orange_pie, pear_ice)

In the final solution, apple item 1, orange item 2, and pear item 3 remain.

I've tried various dplyr functions but to no avail. I have also looked for similar solutions but do not think they address my problem because these examples might have similar column names across two dataframes.

Upvotes: 5

Views: 88

Answers (1)

akrun
akrun

Reputation: 887501

In base R, we split the data columns into a list of data.frame having similar prefix by splitting on the suffix removed column names, Extract ([) the columns from the corresponding index in brief data with Map and cbind the columns in the list with do.call

do.call(cbind,  Map(`[`, split.default(types, trimws(names(types), 
       whitespace = "_.*")), brief))

-output

 apple_cake   orange_pie   pear_ice
1  0.9065347  0.661994002  0.1988529
2  1.0429968  0.736754956  0.9714787
3  0.5517950 -0.003860467  0.4486231
4 -0.4304989  1.160817987 -0.3410640
5 -0.1701651  0.489398131 -1.1846220

Or another option is to loop over the data.frame brief with imap, select the columns with a select-helper (starts_with) on the names of the brief data column (.y) and then select based on the value index of the brief column (.x)

library(dplyr)
library(purrr)
imap_dfc(brief, ~ types %>%
     select(starts_with(.y)) %>% 
     select(all_of(.x)))
  apple_cake   orange_pie   pear_ice
1  0.9065347  0.661994002  0.1988529
2  1.0429968  0.736754956  0.9714787
3  0.5517950 -0.003860467  0.4486231
4 -0.4304989  1.160817987 -0.3410640
5 -0.1701651  0.489398131 -1.1846220

Upvotes: 2

Related Questions