Matching changing patterns of character strings using grep

Question

I have a data.frame which includes trial data on maize hybrids and on maize inbred lines. Each inbred line has a unique entry in this data frame. Additionally, I have maize hybrids, which are the result of the cross of two inbred lines.

I extracted all entries for inbred lines and all entries for hybrids and stored them in two separate vectors. The vector 'inbred' contains the coded entry names for each inbred line:

inbred <- c("F124", "L258", "F309", "P045", "D634", "D183-43", "F205-35")

The vector hybrid contains the coded entry names for each hybrid.

hybrid <- c("F124xP045", "F124xD183-43", "F309xP045", "F205-35xL258", "F309xD634")

Each hybrid has two inbred lines as parents, hence each string in the 'hybrid' vector consists of a first inbred line, which is separated from the second inbred line by an 'x'.

My goal is to find out which inbred lines are a parental component of any of the hybrid lines. The number of occurrences is not of interest to me. Ultimately, I would like to get a new vector of unique inbred lines that are part of at least one hybrid and use them for a PCA.

I tried to use the grep() function to search any character string from vector 'inbred' in vector 'hybrid' and used the unique() function to exclude all hits that occur multiple times. My particularly problem was, that the pattern always changes, since I try to search for a different inbred line in my 'hybrid' vector every time.

This is the code that I used to get unique matches.

unique.parents <- unique(grep(paste(inbred, collapse= "|"),hybrid, value=TRUE))
unique.parents
#[1] "F124xP045"    "F124xD183-43" "F309xP045"    "F205-35xL258" "F309xD634"

My approach only yielded the hybrids which contain any of the inbred lines that I tried to match.

Fabio Marroni · Accepted Answer

I assume that, as you said, "each string in the 'hybrid' vector consists of a first inbred line, which is separated from the second inbred line by an 'x'". Thus, you just need to split the F1 list using "x" as a splitter, unlist, and get unique items.

It's easy:

unique.parents <- unique(unlist(strsplit(hybrid,split="x")))

Matching changing patterns of character strings using grep

Answers (1)

Related Questions