Using tidyr and regex to create two columns

Question

I want to separate one column of data that contains a code and a place name into two separate columns. I tried using tidyr's separate command, but ran into some difficulties with the regex (I haven't used them before and am not able to figure out what I did wrong/how regex really works).

The data are pretty consistent in terms of format. Most observations start with a code and are followed by a location. Occasionally there is an observation that is just a location (no code). Here is a sample of the data:

df <- read.table(text = c("
obs  name
1    01-220 location 1
2    05-23 town 3
3    District 2"), header = T)

I use the following code:

df <- df %>% separate(name, into = c("location_code", "location_name"), sep = "([0-9] [A-z])")

Resulting in (notice that the last number from location_code and first letter from location_name are missing):

obs   location_code    location_name
1     01-22           ocation 1
2     05-2            own 3
3     District 2       NA

My desired output is:

# obs   location_code    location_name
# 1     01-220           location 1
# 2     05-23            town 3
# 3     NA               District 2

Thanks in advance!

akrun · Accepted Answer

We can specify the sep with a regex lookaround.

separate(df, name, into = c("location_code", "location_time"),
                                  "(?<=([0-9] )|\b)(?=[A-Za-z])")
#   obs location_code location_time
#1   1       01-220     location 1
#2   2        05-23         town 3
#3   3                  District 2

Or with extract

extract(df, name, into = c("location_code", "location_time"), "([0-9-]*)\s*(.*)")
#  obs location_code location_time
#1   1        01-220    location 1
#2   2         05-23        town 3
#3   3                  District 2

data

df <- structure(list(obs = 1:3, name = c("01-220 location 1", "05-23 town 3", 
 "District 2")), .Names = c("obs", "name"), class = "data.frame", row.names = c(NA, 
 -3L))

Using tidyr and regex to create two columns

Answers (1)

data

Related Questions