user3350235
user3350235

Reputation: 1

How to extract a portion of a string in R

I have a string "HAD BEEN SEARCHING FOR MY 1ST ANDROID SMARTPHONE FOR OVER 6 MONTHS. WITH BUDGET OF 10K WAS ALL SET TO BUY MMX 110 BUT THEN THOUGHT TO WAIT FOR MMX 116. BUT WITH MMX 116 AT 16K, I TOO HAD TO INCREASE MY BUDGET FROM 15K AND STARTED TO LOOK OUT FOR OTHER SMARTPHONE OPTIONS FROM SAMSUNG, IPHONE, HTC ETC."

I want to extract only those portions which contain the word "SMARTPHONE". The portions must lie within two full stops or two commas. A combination of comma and full stop will also do.

I tried the R code

y=grep("[,.]?[[:alnum:]]+(SMARTPHONE)[[:alnum:]]+[,.]",x,perl=TRUE, value=TRUE)

but it is not giving me the desired result.

Upvotes: 0

Views: 130

Answers (2)

Kay
Kay

Reputation: 2854

Is that what you want?

> a <- "HAD BEEN SEARCHING FOR MY 1ST ANDROID SMARTPHONE FOR OVER 6 MONTHS. WITH BUDGET OF 10K WAS ALL SET TO BUY MMX 110 BUT THEN THOUGHT TO WAIT FOR MMX 116. BUT WITH MMX 116 AT 16K, I TOO HAD TO INCREASE MY BUDGET FROM 15K AND STARTED TO LOOK OUT FOR OTHER SMARTPHONE OPTIONS FROM SAMSUNG, IPHONE, HTC ETC."
> b <- unlist(strsplit(a, "[,.]"))
> (c <- b[grep("SMARTPHONE", b)])
[1] "HAD BEEN SEARCHING FOR MY 1ST ANDROID SMARTPHONE FOR OVER 6 MONTHS"                                         
[2] " I TOO HAD TO INCREASE MY BUDGET FROM 15K AND STARTED TO LOOK OUT FOR OTHER SMARTPHONE OPTIONS FROM SAMSUNG"

Upvotes: 1

philshem
philshem

Reputation: 25381

Wouldn't you want to split on all commas and full stop and then see if each element contains SMARTPHONE?

See here for splitting strings. And then here for partial string matching.

One thing to beware of is splitting on the 'period', which would split after abbreviations, such as Mr.

Upvotes: 1

Related Questions