stats_noob
stats_noob

Reputation: 5897

Specifying "Mutate" Options in DPLYR

I am working with the R programming language.

I have the following data:

sample_data = data.frame(col1 = c("james", "john", "henry"),
                         col2 = c("123 forest road", "jason", "tim"),
                         col3 = c("NA", "124 valley street", "peter"),
                         col4 = c("newyork 6", "newyork7",  "125 ocean road"),
                         col5 = c("NA", "NA", "newyork8")
)
   col1            col2              col3           col4     col5
1 james 123 forest road                NA      newyork 6       NA
2  john           jason 124 valley street       newyork7       NA
3 henry             tim             peter 125 ocean road newyork8

I would like to format this data like this:

desired_result = data.frame(col1 = c("james", "john", "henry"), 
                            var1 = c("123 forest road", "124 valley street", "125 ocean road"),
                            var2 = c("newyork 6", "newyork7", "newyork8") )

   col1              var1      var2
1 james   123 forest road newyork 6
2  john 124 valley street  newyork7
3 henry    125 ocean road  newyork8

In a previous question (Scanning and Replacing Values of Rows in R), I learned how to partly answer this problem:

# for each row, select first appearing column that contains at least one number
tmp1 <- sample_data %>% 
  mutate(across(col2:col5, ~case_when(str_detect(.x, "^\\d") ~ .x, 
                                      TRUE ~ NA_character_)), 
  var1 = coalesce(col2, col3, col4, col5)) %>% 
  select(col1, var1 )

I tried to extend this answer for grabbing the second part - I thought by adjusting the arguments of the "mutate" statement, for each row I could then grab the "second appearing column" containing at least one number:

tmp2 <- sample_data %>% 
      mutate(across(col3:col5, ~case_when(str_detect(.x, "^\\d") ~ .x, 
                                          TRUE ~ NA_character_)), 
      var2 = coalesce(col3, col3, col4, col5)) %>% 
      select(col1, var2 )

But this is not returning the desired column (containing "newyork 6, newyork7, newyork8").

Thank you!

Upvotes: 1

Views: 48

Answers (1)

AndrewGB
AndrewGB

Reputation: 16836

One option would be to use the same method; however, we can do the same steps a second time but only return values that end with a number. Then, we use bind_cols to combine the two sets together. If you do not have duplicate names in col1, then it would be better to use a left_join with the two.

library(tidyverse)

sample_data %>%
  mutate(across(col2:col5, ~case_when(str_detect(.x, "^\\d") ~ .x,
                                      TRUE ~ NA_character_)),
         var1 = coalesce(col2, col3, col4, col5)) %>%
  select(col1, var1 ) %>%
  bind_cols(., sample_data %>%
              mutate(across(col2:col5, ~case_when(str_detect(.x, "\\d$") ~ .x,
                                                  TRUE ~ NA_character_)),
                     var2 = coalesce(col2, col3, col4, col5)) %>%
              select(var2))

Output

   col1              var1      var2
1 james   123 forest road newyork 6
2  john 124 valley street  newyork7
3 henry    125 ocean road  newyork8

If you would like to break it up (as it is a lot in one go), then you can just keep the parts separate, then still use bind_cols on the second one.

df1 <- sample_data %>%
  mutate(across(col2:col5, ~case_when(str_detect(.x, "^\\d") ~ .x,
                                      TRUE ~ NA_character_)),
         var1 = coalesce(col2, col3, col4, col5)) %>%
  select(col1, var1 )

df2 <- sample_data %>%
  mutate(across(col2:col5, ~case_when(str_detect(.x, "\\d$") ~ .x,
                                      TRUE ~ NA_character_)),
         var2 = coalesce(col2, col3, col4, col5)) %>%
  select(var2) %>% 
  bind_cols(df1, .)

Upvotes: 2

Related Questions