Split column with multiple delimiters

Question

I am trying to determine in R how to split a column that has multiple fields with multiple delimiters.

From an API, I get a column in a data frame called "Location". It has multiple location identifiers in it. Here is an example of one entry. (edit- I added a couple more)

6540 BENNINGTON AVE
Kansas City, MO 64133
(39.005620414000475, -94.50998643299965)

4284 E 61ST ST
Kansas City, MO 64130
(39.014638172000446, -94.5335298549997)


3002 SPRUCE AVE
Kansas City, MO 64128
(39.07083265200049, -94.53320606399967)


6022 E Red Bridge Rd
Kansas City, MO 64134
(38.92458893200046, -94.52090062499968)

So the above is the entry in row 1-4, column "location".

I want split this into address, city, state, zip, long and lat columns. Some fields are separated by space or tab while others by comma. Also nothing is fixed width.

I have looked at the reshape package- but seems I need a single deliminator. I can't use space (or can I?) as the address has spaces in it.

Thoughts?

Jota · Accepted Answer

If the data you have is not like this, let everyone know by adding code we can copy and paste into R to reproduce your data (see how this sample data can be easily copied and pasted into R?)

Sample data:

location <- c(
"6540 BENNINGTON AVE
Kansas City, MO 64133
(39.005620414000475, -94.50998643299965)",

"456 POOH LANE
New York City, NY 10025
(40, -90)")

location
#[1] "6540 BENNINGTON AVE\nKansas City, MO 64133\n(39.005620414000475, -94.50998643299965)"
#[2] "456 POOH LANE\nNew York City, NY 10025\n(40, -90)"

A solution:

# Insert a comma between the state abbreviation and the zip code
step1 <- gsub("([[:alpha:]]{2}) ([[:digit:]]{5})", "\1,\2", location)
# get rid of parentheses
step2 <- gsub("$|$", "", step1)
# split on "\n", ",", and ", "
strsplit(step2, "\n|,|, ")

#[[1]]
#[1] "6540 BENNINGTON AVE" "Kansas City"         "MO"                
#[4] "64133"               "39.005620414000475"  "-94.50998643299965"

#[[2]]
#[1] "456 POOH LANE"  "New York City" "NY"           "10025"        
#[5] "40"            "-90"

Split column with multiple delimiters

Answers (2)

Related Questions