Reputation: 1
I have a string "HAD BEEN SEARCHING FOR MY 1ST ANDROID SMARTPHONE FOR OVER 6 MONTHS. WITH BUDGET OF 10K WAS ALL SET TO BUY MMX 110 BUT THEN THOUGHT TO WAIT FOR MMX 116. BUT WITH MMX 116 AT 16K, I TOO HAD TO INCREASE MY BUDGET FROM 15K AND STARTED TO LOOK OUT FOR OTHER SMARTPHONE OPTIONS FROM SAMSUNG, IPHONE, HTC ETC."
I want to extract only those portions which contain the word "SMARTPHONE". The portions must lie within two full stops or two commas. A combination of comma and full stop will also do.
I tried the R code
y=grep("[,.]?[[:alnum:]]+(SMARTPHONE)[[:alnum:]]+[,.]",x,perl=TRUE, value=TRUE)
but it is not giving me the desired result.
Upvotes: 0
Views: 130
Reputation: 2854
Is that what you want?
> a <- "HAD BEEN SEARCHING FOR MY 1ST ANDROID SMARTPHONE FOR OVER 6 MONTHS. WITH BUDGET OF 10K WAS ALL SET TO BUY MMX 110 BUT THEN THOUGHT TO WAIT FOR MMX 116. BUT WITH MMX 116 AT 16K, I TOO HAD TO INCREASE MY BUDGET FROM 15K AND STARTED TO LOOK OUT FOR OTHER SMARTPHONE OPTIONS FROM SAMSUNG, IPHONE, HTC ETC."
> b <- unlist(strsplit(a, "[,.]"))
> (c <- b[grep("SMARTPHONE", b)])
[1] "HAD BEEN SEARCHING FOR MY 1ST ANDROID SMARTPHONE FOR OVER 6 MONTHS"
[2] " I TOO HAD TO INCREASE MY BUDGET FROM 15K AND STARTED TO LOOK OUT FOR OTHER SMARTPHONE OPTIONS FROM SAMSUNG"
Upvotes: 1
Reputation: 25381
Wouldn't you want to split on all commas and full stop and then see if each element contains SMARTPHONE?
See here for splitting strings. And then here for partial string matching.
One thing to beware of is splitting on the 'period', which would split after abbreviations, such as Mr.
Upvotes: 1