Reputation: 282

Tidy way to mutate multiple columns with two parallel lists of column names

I would like to find a tidy way to carry out a data cleaning step that I have to do for multiple pairs of columns.

df <- data.frame(apple = c("Yes", NA, NA, "Yes", NA),
                 apple_NO = c(NA, "No_1", "No_1", NA, "No_2"),
                 berry = c("Yes", "Yes", NA, NA, "Yes"),
                 berry_NO = c(NA, NA, "No_1", "No_1", NA),
                 coconut = c(NA, "Yes", "Yes", "Yes", NA),
                 coconut_NO = c("No_2", NA, NA, NA, "No_2"),
                 dinosaur = c("Yes", NA, NA, NA, "Yes"),
                 dinosaur_NO = c(NA, "No_2", "No_1", "No_2", NA))
> df
  apple apple_NO berry berry_NO coconut coconut_NO dinosaur dinosaur_NO
1   Yes     <NA>   Yes     <NA>    <NA>       No_2      Yes        <NA>
2  <NA>     No_1   Yes     <NA>     Yes       <NA>     <NA>        No_2
3  <NA>     No_1  <NA>     No_1     Yes       <NA>     <NA>        No_1
4   Yes     <NA>  <NA>     No_1     Yes       <NA>     <NA>        No_2
5  <NA>     No_2   Yes     <NA>    <NA>       No_2      Yes        <NA>

cols <- c("apple", "berry", "coconut", "dinosaur")
cols_NO <- c("apple_NO", "berry_NO", "coconut_NO", "dinosaur_NO")

I would like to clean the values in columns in cols_NO and assign new values to the columns in cols

For example, if I just had one column pair to clean, I would do something like:

df <- df %>%
  mutate(apple = case_when(apple_NO == "No_1" ~ "None left",
                           apple_NO == "No_2" ~ "Finished",
                           TRUE ~ apple))

I would also like to do this with berry and berry_NO, and coconut and coconut_NO etc.

The output I want would look something like this:

      apple apple_NO     berry berry_NO  coconut coconut_NO  dinosaur dinosaur_NO
1       Yes     <NA>       Yes     <NA> Finished       No_2       Yes        <NA>
2 None left     No_1       Yes     <NA>      Yes       <NA>  Finished        No_2
3 None left     No_1 None left     No_1      Yes       <NA> None left        No_1
4       Yes     <NA> None left     No_1      Yes       <NA> None left        No_2
5  Finished     No_2       Yes     <NA> Finished       No_2       Yes        <NA>

I think there's a solution somewhere along the lines of using map or map2 or mapply and parallel lists, but I've not used those before and can't seem to find similar solutions that I can use, featuring a list of columns on the left and right hand side of the = in mutate.

Thanks!

EDIT:

This gets me close but I would still need to replace or mutate_at this to my main dataframe. My real data would benefit from using grepl so I've just left that in.

fun.casewhen <- function(cols, cols_NO){
  case_when(grepl("No_1", cols_NO) == TRUE ~ "None left",
            grepl("No_2", cols_NO) == TRUE ~ "Finished",
            TRUE ~ cols)
  }


dftest <- map2(df %>% select(cols), df1 %>% select(cols_NO), ~ fun.casewhen (.x, .y))

The resulting dftest is made up of lists of each of the columns in cols, but with the correct values.

Upvotes: 1

Answers (3)

Jon Spring

Reputation: 66415

Here's a tidyverse solution relying on multiple pivots. I'm sure this isn't the most concise method, but including as another option.

df %>%
  mutate(row = row_number()) %>%
  pivot_longer(-row) %>%
  separate(name, c("group", "keep"), sep = "_") %>%
  pivot_wider(names_from = keep, values_from = value) %>%
  mutate(`NA` = case_when(NO == "No_1" ~ "None left",
                          NO == "No_2" ~ "Finished",
                          TRUE ~ `NA`)) %>%
  pivot_longer(-c(row, group)) %>%
  unite("col", c(group, name)) %>%
  pivot_wider(names_from = col, values_from = value)



    row apple_NA  apple_NO berry_NA  berry_NO coconut_NA coconut_NO dinosaur_NA dinosaur_NO
  <int> <chr>     <chr>    <chr>     <chr>    <chr>      <chr>      <chr>       <chr>      
1     1 Yes       NA       Yes       NA       Finished   No_2       Yes         NA         
2     2 None left No_1     Yes       NA       Yes        NA         Finished    No_2       
3     3 None left No_1     None left No_1     Yes        NA         None left   No_1       
4     4 Yes       NA       None left No_1     Yes        NA         Finished    No_2       
5     5 Finished  No_2     Yes       NA       Finished   No_2       Yes         NA

Upvotes: 1

A. Piong

Reputation: 282

I have found a way to assign the values to my df, albeit not in the best tidyr style:

fun.casewhen <- function(cols, cols_NO){
  case_when(grepl("No_1", cols_NO) == TRUE ~ "None left",
            grepl("No_2", cols_NO) == TRUE ~ "Finished",
            TRUE ~ cols)
  }


df[cols] <- map2(df %>% select(cols), df1 %>% select(cols_NO), ~ fun.casewhen (.x, .y))

I was missing the [cols] in df[cols] before. Nonetheless, it would be great to know if there is a solution more in-line with tidyr standards where the data can be cleaned using mutate and pipes.

Upvotes: 0

Wimpel

Reputation: 27732

here is a data.table + rlist approach

library( data.table )
library( rlist )
data.table::setDT(df)
L <- split.default( df, gsub( "_NO", "", names(df) ) )
rlist::list.cbind( lapply( L, function(x) x[,1 := data.table:: fcoalesce(x) ] ) )

Upvotes: 1

Tidy way to mutate multiple columns with two parallel lists of column names

Answers (3)

Related Questions