How to modify regex expression in R?

Question

I have a CSV file as

AdvertiserName,Market
Wells Fargo,Gary INMetro Chicago IL Metro
EMC,Los Angeles CAMetro Boston MA Metro
Apple,Cupertino CA Metro

and the expression in regex is

res <- 
 gsub('(.*) ([A-Z]{2})*Metro (.*) ([A-Z]{2}) .*','\1,\2:\3,\4',
  xx$Market)

And now the 'Market' column is like 'Gary IN MetroChicago IL Metro' instead of 'Gary INMetro Chicago IL Metro' and the CSV file is like

AdvertiserName,CampaignName
Wells Fargo,Gary IN MetroChicago IL Metro
EMC,Los Angeles CA MetroBoston MA Metro
Apple,Cupertino CA Metro

How to modify the expression in regex so as to get the desired output as

AdvertiserName,City,State
Wells Fargo,Gary,IN
Wells Fargo,Chicago,IL
EMC,Los Angeles,CA
EMC,Boston,MA
Apple,Cupertino,CA

New to R. Any help is appreciated.

Sven Hohenstein · Accepted Answer

Here's a way with strsplit:

# read file
dat <- read.csv("filename.csv", stringsAsFactors = FALSE)

# split strings
splitted <- strsplit(dat$CampaignName, 
                     "( (?=[A-Z]{2}))|((?<=[A-Z]{2}) [A-Z][a-z]+)", perl = TRUE)
# [[1]]
# [1] "Gary"    "IN"      "Chicago" "IL"     
#
# [[2]]
# [1] "Los Angeles" "CA"          "Boston"      "MA"         
#
# [[3]]
# [1] "Cupertino" "CA"       

# create one data frame
setNames(as.data.frame(do.call(rbind, 
                               mapply(cbind, 
                                      dat$AdvertiserName, 
                                      lapply(splitted, function(x)
                                        matrix(x, ncol = 2, byrow = TRUE))))), 
         c("AdvertiserName", "City", "State"))
#   AdvertiserName        City State
# 1    Wells Fargo        Gary    IN
# 2    Wells Fargo     Chicago    IL
# 3            EMC Los Angeles    CA
# 4            EMC      Boston    MA
# 5          Apple   Cupertino    CA

How to modify regex expression in R?

Answers (2)

Related Questions