Reputation: 743
I'd like to subset my main data frame (df_main
) based on a list AND a condition that sit in a separate data frame (df_keep
), so that I end up with a data frame like df_goal
.
I'd like to keep a var in df_main
if it's on the list of var names (df_keep$keep_var
) AND if it is NA
or "r"
(df_keep$othvar
).
My approach seems to work up until the last line, and I don't know why. Thanks for any help!
# Starting point
df_main <- data.frame(coat=c(1:5),hanger=c(1:5),book=c(1:5),dvd=c(1:5),bookcase=c(1:5),
clock=c(1:5),bottle=c(1:5),curtains=c(1:5),wall=c(1:5))
df_keep <- data.frame(keep_var=c("coat","hanger","book","wall","bottle"),othvar=c("r","w","r","w",NA))
# Goal
df_goal <- data.frame(coat=c(1:5),book=c(1:5),bottle=c(1:5))
# Attempt
df_keep$othvar[is.na(df_keep$othvar)] <- "r" # everything in othvar that's NA I want to keep so I recode it to "r"
df_keep <- df_keep %>% filter(othvar == "r") # keep everything that's "r"
df_main <- df_main[df_keep$keep_var] # subset my df_main using updated df_keep
Upvotes: 1
Views: 37
Reputation: 5532
Here is a dplyr
solution
library(dplyr)
# Filter based on 'othvar' and convert factor to string.
keep.vec <- as.character(
(df_keep %>% dplyr::filter(is.na(othvar) | othvar == 'r'))$keep_var
)
df_main %>% dplyr::select(keep.vec)
## coat book bottle
## 1 1 1 1
## 2 2 2 2
## 3 3 3 3
## 4 4 4 4
## 5 5 5 5
Upvotes: 0
Reputation: 10316
You can get rows where your conditions are met in df_keep
like this:
conditions_met <- df_keep$othvar == "r" | is.na(df_keep$othvar)
> conditions_met
[1] TRUE FALSE TRUE FALSE TRUE
You can then use these to get only the correct rows in df_keep$keepvar
:
kept_rows <- df_keep$keep_var[conditions_met]
> kept_rows
[1] coat book bottle
Now, just return only the columns in df_main
whose names match those in kept_rows
:
df_main[, as.character(kept_rows)]
coat book bottle
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
Or in one line:
> df_main[, as.character(df_keep$keep_var[df_keep$othvar == "r" |
+ is.na(df_keep$othvar)])]
coat book bottle
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
Note that the as.character
is needed as your example dataset does not use stringsAsFactors = FALSE
. if it did, you could omit the as.character
argument, so if your real data is in characters rather than in factors you should be able to drop as.character
. Eg:
df_main <-
data.frame(
coat = c(1:5),
hanger = c(1:5),
book = c(1:5),
dvd = c(1:5),
bookcase = c(1:5),
clock = c(1:5),
bottle = c(1:5),
curtains = c(1:5),
wall = c(1:5),
stringsAsFactors = FALSE
)
df_keep <-
data.frame(
keep_var = c("coat", "hanger", "book", "wall", "bottle"),
othvar = c("r", "w", "r", "w", NA),
stringsAsFactors = FALSE
)
df_goal <- data.frame(coat = c(1:5),
book = c(1:5),
bottle = c(1:5))
df_main[, df_keep$keep_var[df_keep$othvar == "r" |
is.na(df_keep$othvar)]]
Upvotes: 1