Reputation: 45
I have some patterns
A <- c("A..A","A.A","AA")
B <- c("B..B","B.B","BB")
and some sequences and their freqs in a data.frame
Seq freq
CACCA 1
CAACC 2
BCCBC 3
I need to match the pattern to the seqs, extract and assign the patterns as follow
Seq freq Pattern From
CACCA 1 A..A A
CAACC 2 AA A
BCCBC 3 B..B B
I used grep to match the pattern but it only returns the whole sequence, how can I extract the matched pattern and get the pattern group.
Thank you!
Upvotes: 2
Views: 57
Reputation: 51612
You will need to put A
and B
in a data frame and stack
it so it's in long format.
d1 <- stack(data.frame(A, B, stringsAsFactors = FALSE))
# values ind
#1 A..A A
#2 A.A A
#3 AA A
#4 B..B B
#5 B.B B
#6 BB B
#use gsub to convert the Seq to the same format as A and B
df$v1 <- gsub(' ', '.', trimws(gsub('[C-Z]', ' ', df$Seq)))
#which gives [1] "A..A" "AA" "B..B"
df$From <- d1$ind[match(df$v1, d1$values)]
df
# Seq freq v1 From
#1 CACCA 1 A..A A
#2 CAACC 2 AA A
#3 BCCBC 3 B..B B
Upvotes: 1