Xiao-yan Pan
Xiao-yan Pan

Reputation: 45

How to match, extract and assign a pattern

I have some patterns

A <- c("A..A","A.A","AA")
B <- c("B..B","B.B","BB")

and some sequences and their freqs in a data.frame

Seq     freq

CACCA     1

CAACC     2

BCCBC     3 

I need to match the pattern to the seqs, extract and assign the patterns as follow

Seq      freq   Pattern   From

CACCA     1   A..A      A

CAACC     2   AA        A

BCCBC     3   B..B      B

I used grep to match the pattern but it only returns the whole sequence, how can I extract the matched pattern and get the pattern group.

Thank you!

Upvotes: 2

Views: 57

Answers (1)

Sotos
Sotos

Reputation: 51612

You will need to put A and B in a data frame and stack it so it's in long format.

d1 <- stack(data.frame(A, B, stringsAsFactors = FALSE))
#  values ind
#1   A..A   A
#2    A.A   A
#3     AA   A
#4   B..B   B
#5    B.B   B
#6     BB   B    

#use gsub to convert the Seq to the same format as A and B
df$v1 <- gsub(' ', '.', trimws(gsub('[C-Z]', ' ', df$Seq)))
#which gives [1] "A..A" "AA"   "B..B"

df$From <- d1$ind[match(df$v1, d1$values)]

df
#    Seq freq   v1 From
#1 CACCA    1 A..A    A
#2 CAACC    2   AA    A
#3 BCCBC    3 B..B    B

Upvotes: 1

Related Questions