Reputation: 5897
I am working with the R programming language.
I have the following data:
sample_data = data.frame(col1 = c("james", "john", "henry"),
col2 = c("123 forest road", "jason", "tim"),
col3 = c("NA", "124 valley street", "peter"),
col4 = c("newyork 6", "newyork7", "125 ocean road"),
col5 = c("NA", "NA", "newyork8")
)
col1 col2 col3 col4 col5
1 james 123 forest road NA newyork 6 NA
2 john jason 124 valley street newyork7 NA
3 henry tim peter 125 ocean road newyork8
I would like to format this data like this:
desired_result = data.frame(col1 = c("james", "john", "henry"),
var1 = c("123 forest road", "124 valley street", "125 ocean road"),
var2 = c("newyork 6", "newyork7", "newyork8") )
col1 var1 var2
1 james 123 forest road newyork 6
2 john 124 valley street newyork7
3 henry 125 ocean road newyork8
In a previous question (Scanning and Replacing Values of Rows in R), I learned how to partly answer this problem:
# for each row, select first appearing column that contains at least one number
tmp1 <- sample_data %>%
mutate(across(col2:col5, ~case_when(str_detect(.x, "^\\d") ~ .x,
TRUE ~ NA_character_)),
var1 = coalesce(col2, col3, col4, col5)) %>%
select(col1, var1 )
I tried to extend this answer for grabbing the second part - I thought by adjusting the arguments of the "mutate" statement, for each row I could then grab the "second appearing column" containing at least one number:
tmp2 <- sample_data %>%
mutate(across(col3:col5, ~case_when(str_detect(.x, "^\\d") ~ .x,
TRUE ~ NA_character_)),
var2 = coalesce(col3, col3, col4, col5)) %>%
select(col1, var2 )
But this is not returning the desired column (containing "newyork 6, newyork7, newyork8").
Thank you!
Upvotes: 1
Views: 48
Reputation: 16836
One option would be to use the same method; however, we can do the same steps a second time but only return values that end with a number. Then, we use bind_cols
to combine the two sets together. If you do not have duplicate names in col1
, then it would be better to use a left_join
with the two.
library(tidyverse)
sample_data %>%
mutate(across(col2:col5, ~case_when(str_detect(.x, "^\\d") ~ .x,
TRUE ~ NA_character_)),
var1 = coalesce(col2, col3, col4, col5)) %>%
select(col1, var1 ) %>%
bind_cols(., sample_data %>%
mutate(across(col2:col5, ~case_when(str_detect(.x, "\\d$") ~ .x,
TRUE ~ NA_character_)),
var2 = coalesce(col2, col3, col4, col5)) %>%
select(var2))
Output
col1 var1 var2
1 james 123 forest road newyork 6
2 john 124 valley street newyork7
3 henry 125 ocean road newyork8
If you would like to break it up (as it is a lot in one go), then you can just keep the parts separate, then still use bind_cols
on the second one.
df1 <- sample_data %>%
mutate(across(col2:col5, ~case_when(str_detect(.x, "^\\d") ~ .x,
TRUE ~ NA_character_)),
var1 = coalesce(col2, col3, col4, col5)) %>%
select(col1, var1 )
df2 <- sample_data %>%
mutate(across(col2:col5, ~case_when(str_detect(.x, "\\d$") ~ .x,
TRUE ~ NA_character_)),
var2 = coalesce(col2, col3, col4, col5)) %>%
select(var2) %>%
bind_cols(df1, .)
Upvotes: 2