Raed Hamed
Raed Hamed

Reputation: 327

How can I automate this simple conditional column operation in R?

I have a data frame that looks like the following:

tibble(term = c(
  rep("a:b", 2),
  rep("b:a", 2),
  rep("c:d", 2),
  rep("d:c", 2),
  rep("g:h", 2),
  rep("h:g", 2)
)) 

I would like to add an extra column in this data frame that takes on the same value for any pair that have the same characters but reversed and separated by a ":" (i.e. a:b and b:a would be codded the same way; similar for c:d and d:c and all the other pairs).

I thought of something like the following:

%>%
  mutate(term_adjusted = case_when(grepl("a:b|b:a", term) ~ "a:b"))

but I have a large number of these pairs in my dataset and would like a way to automate that, hence my question:

How can I do this operation automatically without having to hard code for each pair separately?

Thank you!

Upvotes: 1

Views: 64

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

tidyverse option -

library(dplyr)
library(tidyr)

df %>%
  separate(term, c('term1', 'term2'), sep = ':', remove = FALSE) %>%
  mutate(col1 = pmin(term1, term2), col2 = pmax(term1, term2)) %>%
  unite(result, col1, col2, sep = ':') %>%
  select(term, result)

#  term  result
#   <chr> <chr> 
# 1 a:b   a:b   
# 2 a:b   a:b   
# 3 b:a   a:b   
# 4 b:a   a:b   
# 5 c:d   c:d   
# 6 c:d   c:d   
# 7 d:c   c:d   
# 8 d:c   c:d   
# 9 g:h   g:h   
#10 g:h   g:h   
#11 h:g   g:h   
#12 h:g   g:h   

Upvotes: 1

ktiu
ktiu

Reputation: 2626

How about:

libary(dplyr)

your_data %>%
  mutate(term_adjusted = term %>%
                           strsplit(":") %>%
                           purrr::map_chr(~ .x %>%
                                           sort() %>%
                                           paste(collapse = ":")))

Base R option

your_data$term_adjusted <- your_data$term |>
                             strsplit(":") |>
                             lapply(sort) |>
                             lapply(paste, collapse = ":") |>
                             unlist()

Either returns:

# A tibble: 12 x 2
   term  term_adjusted
   <chr> <chr>
 1 a:b   a:b
 2 a:b   a:b
 3 b:a   a:b
 4 b:a   a:b
 5 c:d   c:d
 6 c:d   c:d
 7 d:c   c:d
 8 d:c   c:d
 9 g:h   g:h
10 g:h   g:h
11 h:g   g:h
12 h:g   g:h

Upvotes: 3

Related Questions