LLL
LLL

Reputation: 743

subsetting a data frame using another data frame with var names and other condition

I'd like to subset my main data frame (df_main) based on a list AND a condition that sit in a separate data frame (df_keep), so that I end up with a data frame like df_goal.

I'd like to keep a var in df_main if it's on the list of var names (df_keep$keep_var) AND if it is NA or "r" (df_keep$othvar).

My approach seems to work up until the last line, and I don't know why. Thanks for any help!

# Starting point 
df_main <- data.frame(coat=c(1:5),hanger=c(1:5),book=c(1:5),dvd=c(1:5),bookcase=c(1:5),
                                 clock=c(1:5),bottle=c(1:5),curtains=c(1:5),wall=c(1:5))
df_keep <- data.frame(keep_var=c("coat","hanger","book","wall","bottle"),othvar=c("r","w","r","w",NA))


# Goal
df_goal <- data.frame(coat=c(1:5),book=c(1:5),bottle=c(1:5))


# Attempt
df_keep$othvar[is.na(df_keep$othvar)] <- "r"   # everything in othvar that's NA I want to keep so I recode it to "r"
df_keep <- df_keep %>% filter(othvar == "r")  # keep everything that's "r"
df_main <- df_main[df_keep$keep_var]  # subset my df_main using updated df_keep  

Upvotes: 1

Views: 37

Answers (2)

steveb
steveb

Reputation: 5532

Here is a dplyr solution

library(dplyr)
# Filter based on 'othvar' and convert factor to string.
keep.vec <- as.character(
    (df_keep %>% dplyr::filter(is.na(othvar) | othvar == 'r'))$keep_var
)
df_main %>% dplyr::select(keep.vec)

##   coat book bottle
## 1    1    1      1
## 2    2    2      2
## 3    3    3      3
## 4    4    4      4
## 5    5    5      5

Upvotes: 0

Luke C
Luke C

Reputation: 10316

You can get rows where your conditions are met in df_keep like this:

conditions_met <- df_keep$othvar == "r" | is.na(df_keep$othvar)

> conditions_met
[1]  TRUE FALSE  TRUE FALSE  TRUE

You can then use these to get only the correct rows in df_keep$keepvar:

kept_rows <- df_keep$keep_var[conditions_met]

> kept_rows
[1] coat   book   bottle

Now, just return only the columns in df_main whose names match those in kept_rows:

df_main[, as.character(kept_rows)]
  coat book bottle
1    1    1      1
2    2    2      2
3    3    3      3
4    4    4      4
5    5    5      5

Or in one line:

> df_main[, as.character(df_keep$keep_var[df_keep$othvar == "r" |
+                                           is.na(df_keep$othvar)])]
  coat book bottle
1    1    1      1
2    2    2      2
3    3    3      3
4    4    4      4
5    5    5      5

Note that the as.character is needed as your example dataset does not use stringsAsFactors = FALSE. if it did, you could omit the as.character argument, so if your real data is in characters rather than in factors you should be able to drop as.character. Eg:

df_main <-
  data.frame(
    coat = c(1:5),
    hanger = c(1:5),
    book = c(1:5),
    dvd = c(1:5),
    bookcase = c(1:5),
    clock = c(1:5),
    bottle = c(1:5),
    curtains = c(1:5),
    wall = c(1:5),
    stringsAsFactors = FALSE
  )

df_keep <-
  data.frame(
    keep_var = c("coat", "hanger", "book", "wall", "bottle"),
    othvar = c("r", "w", "r", "w", NA),
    stringsAsFactors = FALSE
  )

df_goal <- data.frame(coat = c(1:5),
                      book = c(1:5),
                      bottle = c(1:5))


df_main[, df_keep$keep_var[df_keep$othvar == "r" |
                                          is.na(df_keep$othvar)]]

Upvotes: 1

Related Questions