Reputation: 282
I would like to find a tidy way to carry out a data cleaning step that I have to do for multiple pairs of columns.
df <- data.frame(apple = c("Yes", NA, NA, "Yes", NA),
apple_NO = c(NA, "No_1", "No_1", NA, "No_2"),
berry = c("Yes", "Yes", NA, NA, "Yes"),
berry_NO = c(NA, NA, "No_1", "No_1", NA),
coconut = c(NA, "Yes", "Yes", "Yes", NA),
coconut_NO = c("No_2", NA, NA, NA, "No_2"),
dinosaur = c("Yes", NA, NA, NA, "Yes"),
dinosaur_NO = c(NA, "No_2", "No_1", "No_2", NA))
> df
apple apple_NO berry berry_NO coconut coconut_NO dinosaur dinosaur_NO
1 Yes <NA> Yes <NA> <NA> No_2 Yes <NA>
2 <NA> No_1 Yes <NA> Yes <NA> <NA> No_2
3 <NA> No_1 <NA> No_1 Yes <NA> <NA> No_1
4 Yes <NA> <NA> No_1 Yes <NA> <NA> No_2
5 <NA> No_2 Yes <NA> <NA> No_2 Yes <NA>
cols <- c("apple", "berry", "coconut", "dinosaur")
cols_NO <- c("apple_NO", "berry_NO", "coconut_NO", "dinosaur_NO")
I would like to clean the values in columns in cols_NO
and assign new values to the columns in cols
For example, if I just had one column pair to clean, I would do something like:
df <- df %>%
mutate(apple = case_when(apple_NO == "No_1" ~ "None left",
apple_NO == "No_2" ~ "Finished",
TRUE ~ apple))
I would also like to do this with berry
and berry_NO
, and coconut
and coconut_NO
etc.
The output I want would look something like this:
apple apple_NO berry berry_NO coconut coconut_NO dinosaur dinosaur_NO
1 Yes <NA> Yes <NA> Finished No_2 Yes <NA>
2 None left No_1 Yes <NA> Yes <NA> Finished No_2
3 None left No_1 None left No_1 Yes <NA> None left No_1
4 Yes <NA> None left No_1 Yes <NA> None left No_2
5 Finished No_2 Yes <NA> Finished No_2 Yes <NA>
I think there's a solution somewhere along the lines of using map
or map2
or mapply
and parallel lists, but I've not used those before and can't seem to find similar solutions that I can use, featuring a list of columns on the left and right hand side of the =
in mutate
.
Thanks!
EDIT:
This gets me close but I would still need to replace or mutate_at
this to my main dataframe. My real data would benefit from using grepl
so I've just left that in.
fun.casewhen <- function(cols, cols_NO){
case_when(grepl("No_1", cols_NO) == TRUE ~ "None left",
grepl("No_2", cols_NO) == TRUE ~ "Finished",
TRUE ~ cols)
}
dftest <- map2(df %>% select(cols), df1 %>% select(cols_NO), ~ fun.casewhen (.x, .y))
The resulting dftest is made up of lists of each of the columns in cols
, but with the correct values.
Upvotes: 1
Views: 243
Reputation: 66415
Here's a tidyverse solution relying on multiple pivots. I'm sure this isn't the most concise method, but including as another option.
df %>%
mutate(row = row_number()) %>%
pivot_longer(-row) %>%
separate(name, c("group", "keep"), sep = "_") %>%
pivot_wider(names_from = keep, values_from = value) %>%
mutate(`NA` = case_when(NO == "No_1" ~ "None left",
NO == "No_2" ~ "Finished",
TRUE ~ `NA`)) %>%
pivot_longer(-c(row, group)) %>%
unite("col", c(group, name)) %>%
pivot_wider(names_from = col, values_from = value)
row apple_NA apple_NO berry_NA berry_NO coconut_NA coconut_NO dinosaur_NA dinosaur_NO
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 Yes NA Yes NA Finished No_2 Yes NA
2 2 None left No_1 Yes NA Yes NA Finished No_2
3 3 None left No_1 None left No_1 Yes NA None left No_1
4 4 Yes NA None left No_1 Yes NA Finished No_2
5 5 Finished No_2 Yes NA Finished No_2 Yes NA
Upvotes: 1
Reputation: 282
I have found a way to assign the values to my df, albeit not in the best tidyr style:
fun.casewhen <- function(cols, cols_NO){
case_when(grepl("No_1", cols_NO) == TRUE ~ "None left",
grepl("No_2", cols_NO) == TRUE ~ "Finished",
TRUE ~ cols)
}
df[cols] <- map2(df %>% select(cols), df1 %>% select(cols_NO), ~ fun.casewhen (.x, .y))
I was missing the [cols]
in df[cols]
before. Nonetheless, it would be great to know if there is a solution more in-line with tidyr standards where the data can be cleaned using mutate
and pipes.
Upvotes: 0
Reputation: 27732
here is a data.table
+ rlist
approach
library( data.table )
library( rlist )
data.table::setDT(df)
L <- split.default( df, gsub( "_NO", "", names(df) ) )
rlist::list.cbind( lapply( L, function(x) x[,1 := data.table:: fcoalesce(x) ] ) )
Upvotes: 1