Reputation: 528
My example data:
l1
[1] "xmms-1.2.11-x86_64-5" "xmms-1.2.11-x86_64-6"
[3] "xmodmap-1.0.10-x86_64-1" "xmodmap-1.0.9-x86_64-1"
[5] "xmodmap3-1.0.10-x86_64-1" "xmodmap3-1.0.9-x86_64-1"
I am using R and would like a regular expression that will capture just the characters before the first dash. Such as
xmms
xmms
xmodmap
xmodmap
xmodmap3
xmodmap3
Since I am using R, the regex needs to be Perl compliant.
I thought I could do this with using a lookbehind on the dash, but I just get a match for the whole string. This is the pattern I tried:
grepl("(?<=[a-z0-9])-",l1, perl=T)
, but it just matches the whole string. I think if I had the first dash as a capture group, I could maybe use the lookbehind, but I don't know how to build the regex with the lookbehind and the capture group.
I looked around at some other questions for possible answers and it seems maybe I need an non-greedy symbol? I tried grepl("(?<=[a-z0-9])-/.+?(?=-)/",l1, perl=T)
, but that didn't work either.
I'm open to other suggestions on how to capture the first set of characters before the dash. I'm currently in base R, but I'm fine with using any packages, like stringr.
Upvotes: 0
Views: 948
Reputation: 886968
1) Base R An option is sub
from base R
to match the -
followed by characters (.*
) and then replace with blank (""
)
sub("-.*", "", l1)
#[1] "xmms" "xmms" "xmodmap" "xmodmap" "xmodmap3" "xmodmap3"
Or capture as a group
sub("(\\w+).*", "\\1", l1)
#[1] "xmms" "xmms" "xmodmap" "xmodmap" "xmodmap3" "xmodmap3"
Or with regmatches/regexpr
regmatches(l1, regexpr('\\w+', l1))
#[1] "xmms" "xmms" "xmodmap" "xmodmap" "xmodmap3" "xmodmap3"
or using trimws
trimws(l1, "right", whitespace = "-.*")
#[1] "xmms" "xmms" "xmodmap" "xmodmap" "xmodmap3" "xmodmap3"
Or using read.table
read.table(text = l1, sep="-", header = FALSE, stringsAsFactors = FALSE)$V1
#[1] "xmms" "xmms" "xmodmap" "xmodmap" "xmodmap3" "xmodmap3"
or with strsplit
sapply(strsplit(l1, "-"), `[`, 1)
2) stringr Or with word
from stringr
library(stringr)
word(l1, 1, sep="-")
Or with str_remove
str_remove(l1, "-.*")
#[1] "xmms" "xmms" "xmodmap" "xmodmap" "xmodmap3" "xmodmap3"
3) stringi Or with stri_extract_first
from stringi
library(stringi)
stri_extract_first(l1, regex = "\\w+")
#[1] "xmms" "xmms" "xmodmap" "xmodmap" "xmodmap3" "xmodmap3"
Note: grep/grepl
is for detecting a pattern in the string. For replacing/extracting substring, use sub/regexpr/regmatches
in base R
l1 <- c("xmms-1.2.11-x86_64-5", "xmms-1.2.11-x86_64-6", "xmodmap-1.0.10-x86_64-1",
"xmodmap-1.0.9-x86_64-1", "xmodmap3-1.0.10-x86_64-1", "xmodmap3-1.0.9-x86_64-1"
)
Upvotes: 0
Reputation: 8332
I guess the simplest regex to match what you're after would be
^[^-]+
Match start of string (^
) and at least one character (the +
) that isn't a -
([^-]
).
If you need to capture it, add surrounding parentheses.
^([^-]+)
Upvotes: 0
Reputation: 388817
You could also extract till first occurrence of "-"
. Using base R sub
sub("(.*?)-.*", "\\1", l)
#[1] "xmms" "xmms" "xmodmap" "xmodmap" "xmodmap3" "xmodmap3"
OR with stringr::str_extract
stringr::str_extract(l, "(.*?)(?=-)")
data
l <- c("xmms-1.2.11-x86_64-5","xmms-1.2.11-x86_64-6","xmodmap-1.0.10-x86_64-1",
"xmodmap-1.0.9-x86_64-1","xmodmap3-1.0.10-x86_64-1" ,"xmodmap3-1.0.9-x86_64-1")
Upvotes: 3