Reputation: 10578
I have a df that looks like this:
df <- data.frame(
x = c(
"800 Block of MAIN ST",
"100 Block of CHESTNUT AV",
"BAY ST / WELLINGTON ST",
"LARKIN ST / ELLIS ST",
"MAPLE ST / WELLINGTON ST",
"MEANDERING RD / MAIN ST"),
y = rnorm(6))
I want to extract the first street name and the last street type.
Desired Output:
x y x.1 x.2
1 800 Block of MAIN ST -0.6745405 MAIN ST
2 100 Block of CHESTNUT AV -1.1316017 CHESTNUT AV
3 BAY ST / WELLINGTON ST 1.2887577 BAY ST
4 LARKIN ST / ELLIS ST 1.4606264 LARKIN ST
5 MAPLE ST / WELLINGTON ST 0.6538595 MAPLE ST
6 MEANDERING RD / MAIN ST 0.8472322 MEANDERING ST
Upvotes: 3
Views: 137
Reputation: 92300
Here's a similar solution using a single regex expression combined with the new tstrsplit
function from the development version of data.table
library(data.table) # v1.9.5+
setDT(df)[, c("street", "type") :=
tstrsplit(sub(".*?([A-Z]{3,}).*([A-Z]{2,})", "\\1,\\2", x), ",")]
df
# x y street type
# 1: 800 Block of MAIN ST -1.4391801 MAIN ST
# 2: 100 Block of CHESTNUT AV 1.4917789 CHESTNUT AV
# 3: BAY ST / WELLINGTON ST -0.0369405 BAY ST
# 4: LARKIN ST / ELLIS ST 0.7320230 LARKIN ST
# 5: MAPLE ST / WELLINGTON ST 0.7189120 MAPLE ST
# 6: MEANDERING RD / MAIN ST -0.9836794 MEANDERING ST
Basically, the idea here is to capture both groups within a single sub
call, concatenate them with a comma (you can choose something else if you like) and then perform a transpose sting split (tstrsplit
) in order to convert them into two separate columns while creating them by reference (using the :=
operator)
Upvotes: 3
Reputation: 17621
df <- within(df, st_name <- sub(".*?([A-Z]{3,}).*", "\\1", x, perl=TRUE))
df <- within(df, st_type <- sub(".+? ([A-Z]+)$", "\\1", x, perl=TRUE))
# x y st_name st_type
#1 800 Block of MAIN ST 1.92908789 MAIN ST
#2 100 Block of CHESTNUT AV 0.02487045 CHESTNUT AV
#3 BAY ST / WELLINGTON ST -2.33411242 BAY ST
#4 LARKIN ST / ELLIS ST -1.17946144 LARKIN ST
#5 MAPLE ST / WELLINGTON ST 0.12913797 MAPLE ST
#6 MEANDERING RD / MAIN ST -0.94150930 MEANDERING ST
Or if you aren't fond of using within
:
df$st_name <- sub(".*?([A-Z]{3,}).*", "\\1", df$x, perl=TRUE)
df$st_type <- sub(".+? ([A-Z]+)$", "\\1", df$x, perl=TRUE)
Upvotes: 3
Reputation: 28461
library(stringr)
df[,c("street", "type")] <- list(str_extract(df$x, "[A-Z]{3,}"), str_extract(df$x, "[A-Z]+$"))
# x y street type
# 1 800 Block of MAIN ST 0.7787495 MAIN ST
# 2 100 Block of CHESTNUT AV -0.7069777 CHESTNUT AV
# 3 BAY ST / WELLINGTON ST -0.2365061 BAY ST
# 4 LARKIN ST / ELLIS ST 0.1399500 LARKIN ST
# 5 MAPLE ST / WELLINGTON ST -0.3423978 MAPLE ST
# 6 MEANDERING RD / MAIN ST 0.6434969 MEANDERING ST
Upvotes: 4