Reputation: 247
I have
str=c("00005.profit", "00005.profit-in","00006.profit","00006.profit-in")
and I want to get
"00005.profit" "00006.profit"
How can I achieve this using grep
in R?
Upvotes: 1
Views: 4267
Reputation: 226971
I'm actually interpreting your question differently. I think you might want
grep("[0-9]+\\.profit$",str,value=TRUE)
That is, if you only want the strings that end with profit
. The $
special character stands for "end of string", so it excludes cases that have additional characters at the end ... The \\.
means "I really want to match a dot, not any character at all" (a .
by itself would match any character). You weren't entirely clear about your target pattern -- you might prefer "0+[1-9]\\.profit$"
(any number of zeros followed by a single non-zero digit), or even "0{4}[1-9]\\.profit$"
(4 zeros followed by a single non-zero digit).
Upvotes: 3
Reputation: 368629
Here is one way:
R> s <- c("00005.profit", "00005.profit-in","00006.profit","00006.profit-in")
> unique(gsub("([0-9]+.profit).*", "\\1", s))
[1] "00005.profit" "00006.profit"
R>
We define a regular expression as digits followed by .profit
, which we assign by keeping the expression in parantheses. The \\1
then recalls the first such assignment -- and as we recall nothing else that is what we get. The unique()
then reduces the four items to two unique ones.
Upvotes: 8
Reputation: 93938
Dirk's answer is pretty much the ideal generalisable answer, but here are a couple of other options based on the fact that your example always has a -
character starting the part you wish to chop off:
1: gsub
to return everything prior to the -
gsub("(.+)-.+","\\1",str)
2: strsplit
on -
and keep only the first part.
sapply(strsplit(str,"-"),head,1)
Both return:
[1] "00005.profit" "00005.profit" "00006.profit" "00006.profit"
which you can then wrap in unique
to not return duplicates like:
unique(gsub("(.+)-.+","\\1",str))
unique(sapply(strsplit(str,"-"),head,1))
These will then return:
[1] "00005.profit" "00006.profit"
Another non-generalisable solution would be to just take the first 12 characters (assuming string length for the part you want to keep doesn't change):
unique(substr(str,1,12))
[1] "00005.profit" "00006.profit"
Upvotes: 4