user3354212
user3354212

Reputation: 1112

extract partial string based on pattern in r

I would like to extract partial string from a list. I don't know how to define the pattern of the strings. Thank you for your helps.

library(stringr)
names = c("GAPIT..flowerdate.GWAS.Results.csv","GAPIT..flwrcolor.GWAS.Results.csv",
"GAPIT..height.GWAS.Results.csv","GAPIT..matdate.GWAS.Results.csv")
# I want to extract out "flowerdate", "flwrcolor", "height" and "matdate"
traits <- str_extract_all(string = files, pattern = "..*.")
# the result is not what I want.

Upvotes: 2

Views: 224

Answers (4)

G. Grothendieck
G. Grothendieck

Reputation: 270195

Here are a few solutions. The first two do not use regular expressions at all. The lsat one uses a single gsub:

1) read.table. This assumes the desired string is always the 3rd field:

read.table(text = names, sep = ".", as.is = TRUE)[[3]]

2) strsplit This assumes the desired string has more than 3 characters and is lower case:

sapply(strsplit(names, "[.]"), Filter, f = function(x) nchar(x) > 3 & tolower(x) == x)

3) gsub This assumes that two dots preceed the string and one dot plus junk not containing two successive dots comes afterwards:

gsub(".*[.]{2}|[.].*", "", names)

REVISED Added additional solutions.

Upvotes: 1

Sven Hohenstein
Sven Hohenstein

Reputation: 81733

Use sub:

sub(".*\\.{2}(.+?)\\..*", "\\1", names)
# [1] "flowerdate" "flwrcolor"  "height"     "matdate"   

Upvotes: 2

Jilber Urbina
Jilber Urbina

Reputation: 61214

You can also use regmatches

> regmatches(c, regexpr("[[:lower:]]+", c))
[1] "flowerdate" "flwrcolor"  "height"     "matdate"   

I encourage you not to use c as a variable name, because you're overwriting c function.

Upvotes: 4

user3354212
user3354212

Reputation: 1112

I borrow the answer from Roman Luštrik for my previous question “How to extract out a partial name as new column name in a data frame”

traits <- unlist(lapply(strsplit(names, "\\."), "[[", 3))

Upvotes: 2

Related Questions