Reputation: 1473
I have data like below -
PLAYSTORE BANGKOK
FLOAT@THE BAY SINGAPORE
YANTRA SINGAPORE
AIRASIA_QS9DQQL SINGAPORE
I want to remove the last word from each string, if it is in list of cities that i am looking for using this -
sub('(?i)^(.*)\\b(singapore|stockholm|singapor|bangkok|kuala lumpur|london|tokyo)$','\\2', merch_desc$desc2 )
But \1 or \2 dont work and i get the full string again. Is there a way to correct this?
I want 2 outputs - 1 with the company names and another with the locations into 2 separate vectors.
merch_desc$merch -
PLAYSTORE
FLOAT@THE BAY
YANTRA
AIRASIA_QS9DQQL
merch_desc$loc -
BANGKOK
SINGAPORE
SINGAPORE
SINGAPORE
It seems strange that it works on string but not on data frames -
test$desc2
[1] "qoo10 singapore " "bill payment via internet banking" "mcdonald's restaurants singapore "
[4] "hdb season parking singapore " "grabtaxi pte ltd singapore "
This does not work -
sub('^.* (singapore|stockholm|singapor|bangkok|kuala lumpur|london|tokyo)$', '\\1', test$desc2 )
[1] "qoo10 singapore " "bill payment via internet banking" "mcdonald's restaurants singapore "
[4] "hdb season parking singapore " "grabtaxi pte ltd singapore "
But this works -
sub('^.* (singapore|stockholm|singapor|bangkok|kuala lumpur|london|tokyo)$', '\\1', 'tigerair y843km singapore' )
[1] "singapore"
Edit 2 -
Use trimws(). Without Trimws it does not handle the multiple spaces.
Thanks, Manish
Upvotes: 3
Views: 1408
Reputation: 887951
We can capture the substring as groups using sub
in pattern
, then we add a delimiter (,
) between the capture groups in the replacement
, use that as sep
in the read.table
. If there are leading/lagging spaces, remove it by str_trim
from stringr
by looping through the columns.
library(stringr)
d1 <- read.table(text=sub('(.*)\\s+(\\S+)$', '\\1,\\2', v1),sep=',')
d1[] <- lapply(d1, str_trim)
d1
# V1 V2
#1 PLAYSTORE BANGKOK
#2 FLOAT@THE BAY SINGAPORE
#3 YANTRA SINGAPORE
#4 AIRASIA_QS9DQQL SINGAPORE
Or as suggested by @RichardScriven, a base R
option for trimming leading/lagging spaces is trimws
.
d1[] <- lapply(d1, trimws)
v1 <- c('PLAYSTORE BANGKOK','FLOAT@THE BAY SINGAPORE',
'YANTRA SINGAPORE',
'AIRASIA_QS9DQQL SINGAPORE')
Upvotes: 3