pdubois
pdubois

Reputation: 7790

How to assign vectors into multiple variables in dplyr mutate

I have the following data frame:

library(tidyverse)

dat <-structure(list(motif_name_binned = c("Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin1", 
"Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin2", 
"Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin3"
), motif_score = c(6.816695, 6.816695, 6.816695)), row.names = c(NA, 
-3L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("motif_name_binned", 
"motif_score"))

dat

Which gives this:

> dat
# A tibble: 3 x 2
                                                  motif_name_binned motif_score
                                                              <chr>       <dbl>
1 Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin1    6.816695
2 Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin2    6.816695
3 Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin3    6.816695

I can get what I want by extracting the value of motif_named_binned using this code:

dat %>% 
  mutate(motif = str_match(motif_name_binned,"^(.*?)\\/.*?")[,2], 
         inst =  str_match(motif_name_binned,"^.*?\\/.*?\\/.*?\\.instid_(.*?)\\.bin\\d+")[,2],
         binno = as.integer(str_match(motif_name_binned,"^.*?\\/.*?\\/.*?\\.bin(\\d+)")[,2])) 

Which gives

# A tibble: 3 x 5
                                                  motif_name_binned motif_score        motif                     inst binno
                                                              <chr>       <dbl>        <chr>                    <chr> <int>
1 Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin1    6.816695 Ddit3::Cebpa chr1:183286845-183287245     1
2 Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin2    6.816695 Ddit3::Cebpa chr1:183286845-183287245     2
3 Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin3    6.816695 Ddit3::Cebpa chr1:183286845-183287245     3

But notice that I have to execute the regex 3 times and assign it to a variable one by one. Where in fact I can use single regex such as this:

str_match(motif_name_binned,"^(.*?)\\/.*?\\/.*?\\.instid_(.*?)\\.bin(\\d+)")[,c(2,3,4)]

How I incorporate this later all-in-one regex in dplyr mutate()?

Upvotes: 2

Views: 517

Answers (1)

akuiper
akuiper

Reputation: 214957

You can use tidyr::extract to convert the capturing groups in the regular expression into new columns:

library(tidyr)
dat %>% 
    extract(motif_name_binned, c('motif', 'inst', 'binno'), regex = "^(.*?)\\/.*?\\/.*?\\.instid_(.*?)\\.bin(\\d+)", remove = FALSE)

# A tibble: 3 x 5
#                                                  motif_name_binned        motif                     inst binno motif_score
#*                                                             <chr>        <chr>                    <chr> <chr>       <dbl>
#1 Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin1 Ddit3::Cebpa chr1:183286845-183287245     1    6.816695
#2 Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin2 Ddit3::Cebpa chr1:183286845-183287245     2    6.816695
#3 Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin3 Ddit3::Cebpa chr1:183286845-183287245     3    6.816695

Upvotes: 4

Related Questions