Ff Yy
Ff Yy

Reputation: 247

how to use grep in R to get the specified character?

I have

str=c("00005.profit", "00005.profit-in","00006.profit","00006.profit-in")  

and I want to get

 "00005.profit"  "00006.profit"

How can I achieve this using grep in R?

Upvotes: 1

Views: 4267

Answers (3)

Ben Bolker
Ben Bolker

Reputation: 226971

I'm actually interpreting your question differently. I think you might want

grep("[0-9]+\\.profit$",str,value=TRUE)

That is, if you only want the strings that end with profit. The $ special character stands for "end of string", so it excludes cases that have additional characters at the end ... The \\. means "I really want to match a dot, not any character at all" (a . by itself would match any character). You weren't entirely clear about your target pattern -- you might prefer "0+[1-9]\\.profit$" (any number of zeros followed by a single non-zero digit), or even "0{4}[1-9]\\.profit$" (4 zeros followed by a single non-zero digit).

Upvotes: 3

Dirk is no longer here
Dirk is no longer here

Reputation: 368629

Here is one way:

R> s <- c("00005.profit", "00005.profit-in","00006.profit","00006.profit-in")
> unique(gsub("([0-9]+.profit).*", "\\1", s))
[1] "00005.profit" "00006.profit"
R> 

We define a regular expression as digits followed by .profit, which we assign by keeping the expression in parantheses. The \\1 then recalls the first such assignment -- and as we recall nothing else that is what we get. The unique() then reduces the four items to two unique ones.

Upvotes: 8

thelatemail
thelatemail

Reputation: 93938

Dirk's answer is pretty much the ideal generalisable answer, but here are a couple of other options based on the fact that your example always has a - character starting the part you wish to chop off:

1: gsub to return everything prior to the -

gsub("(.+)-.+","\\1",str)

2: strsplit on - and keep only the first part.

sapply(strsplit(str,"-"),head,1)

Both return:

[1] "00005.profit" "00005.profit" "00006.profit" "00006.profit"

which you can then wrap in unique to not return duplicates like:

unique(gsub("(.+)-.+","\\1",str))
unique(sapply(strsplit(str,"-"),head,1))

These will then return:

[1] "00005.profit" "00006.profit"

Another non-generalisable solution would be to just take the first 12 characters (assuming string length for the part you want to keep doesn't change):

unique(substr(str,1,12))
[1] "00005.profit" "00006.profit"

Upvotes: 4

Related Questions