Reputation: 83
I have a dataframe where I want to replace the variables
Then I'd like to delete the variables age1_corr_1, age1_corr_2, ..., age1_corr_n. I have figured out how to do the first part (change the values) in a loop but couldn't figure out how to delete the variables after. Any suggestion?
Sample data
y <- data.frame("age_1" = c(5,1,1,10), "age1_corr_1" = c(1,NA,NA,0), "age_2" = c(1,2,3,4), "age1_corr_2" = c(NA, NA, 10, 9),
"age_3" = c(4,3,2,5), "age1_corr_3" = c(NA,NA,NA,6), "age_4" = c(1,4,2,7), "age1_corr_4" = c(NA, NA, NA,NA))
The code that will change values of age_n based on age1_corr_n
for(i in 1:4){
cname1 <- paste0("age_",i)
cname2 <- paste0("age1_corr_",i)
y[,cname1] <- ifelse(!is.na(y[,cname2]), y[,cname2], y[,cname1])
}
The output I'd like to have is
age_1 age_2 age_3 age_4
1 1 1 4 1
2 1 2 3 4
3 1 10 2 2
4 0 9 6 7
Upvotes: 0
Views: 2568
Reputation: 5287
Here are two other approaches that can handle an arbitrary number of columns. For this specific example dataset, make sure that the 4th column is correctly represented as a float with y$age1_corr_4 <- as.numeric(y$age1_corr_4)
).
Like Dan Hall's response, one approach keeps the columns you want...
library(magrittr)
coalesce_corr1 <- function( index ) {
name_age <- paste0("age_" , index)
name_corr <- paste0("age1_corr_", index)
y %>%
dplyr::mutate(
!!name_age := dplyr::coalesce(.data[[name_corr]], .data[[name_age]])
) %>%
dplyr::select(!!name_age)
}
1:4 %>%
purrr::map(coalesce_corr) %>%
dplyr::bind_cols()
...and the other drops the columns you don't want.
z <- y
coalesce_corr2 <- function( index ) {
name_age <- paste0( "age_" , index)
name_corr <- paste0( "age1_corr_", index)
z <<- z %>%
dplyr::mutate(
!!name_age := dplyr::coalesce(.data[[!!name_corr]], .data[[!!name_age]])
)
z[[name_corr]] <<- NULL
}
1:4 %>%
purrr::walk(coalesce_corr2)
z
I wish this last one didn't require a global variable (that uses <<-
), and for this reason, I actually recommend Dan's approaches, but I wanted to try out quosures for output variables.
Upvotes: 0
Reputation: 5287
Do the real work with dplyr::coalesce()
(description: "Given a set of vectors, coalesce() finds the first non-missing value at each position."). Then drop the columns with dplyr::select()
, using a negative sign in front of the columns you don't need anymore.
library(magrittr)
y %>%
dplyr::mutate(
age1_corr_4 = as.numeric(age1_corr_4), # Delete this line if it's already a numeric/floating data type.
age_1 = dplyr::coalesce(age1_corr_1, age_1),
age_2 = dplyr::coalesce(age1_corr_2, age_2),
age_3 = dplyr::coalesce(age1_corr_3, age_3),
age_4 = dplyr::coalesce(age1_corr_4, age_4)
) %>%
dplyr::select(
-age1_corr_1, -age1_corr_2, -age1_corr_3, -age1_corr_4
)
Produces
age_1 age_2 age_3 age_4
1 1 1 4 1
2 1 2 3 4
3 1 10 2 2
4 0 9 6 7
Edit: I apologize, I focused on the coalesce part of the task and ignored the n part of the task.
Upvotes: 1
Reputation: 7600
You have several options if there is a pattern to the columns you want to remove (or conversely, the ones you want to keep).
Here's the data you provided:
y <- data.frame("age_1" = c(5,1,1,10), "age1_corr_1" = c(1,NA,NA,0), "age_2" = c(1,2,3,4), "age1_corr_2" = c(NA, NA, 10, 9),
"age_3" = c(4,3,2,5), "age1_corr_3" = c(NA,NA,NA,6), "age_4" = c(1,4,2,7), "age1_corr_4" = c(NA, NA, NA,NA))
Here's a dplyr example of how to get only those columns that follow the pattern age_N
, where N is 1, 2, 3, or 4:
library(dplyr)
x <- select(y, paste("age", 1:4, sep = "_"))
Alternatively, you could choose the pattern for the columns you DON'T want:
x <- select(y, -grep("_corr_", current_vars()))
This uses the following strategy: * you can select for everything BUT a column or set of columns by adding a minus sign first. * current_vars() is a helper function in dplyr that evaluates to all the variable names for the data (here, y)
Upvotes: 1