Reputation: 340
I'm working on a class project using a Chicago crime data set and R. One of the attributes in the data set is Block
which contains partial addresses where the incident occurred. For example:
+--------------------------+
| Block |
+--------------------------+
| 45xx N Locust Grove St |
| 65xx Hawthorne Ave |
+--------------------------+
The values in Block
vary in length but I am wanting to create a new variable with the street type, St, Ave, Blvd, etc. I have tried using the separate function from tidyr.
df <- df %>%
separate(Block, into = c("partial.address, "type"),
sep = " ", extra = "merge", fill = "left")
However, this returns the number, 45xx, as the partial.address
value and the remaining value is in type
. How can I select the street type from the address?
I'm hoping to get something like this as output:
+--------------------------+-------------+
| partial.address | type |
+--------------------------+-------------+
| 45xx N Locust Grove | St |
| 65xx Hawthorne | Ave |
+--------------------------+-------------+
Upvotes: 1
Views: 192
Reputation: 388807
You can use extract
:
tidyr::extract(df, Block, c("partial.address", "type"), "(.*)(St|Ave)")
# partial.address type
#1 45xx N Locust Grove St
#2 65xx Hawthorne Ave
Or using stringr
:
library(dplyr)
library(stringr)
df %>%
mutate(type = str_extract(Block, '(St|Ave)'),
partial.address = str_remove(Block, type))
You can include more patterns in (St|Ave)
if you have more.
If we want to capture the last word of each Block
we can use :
df %>%
mutate(type = str_extract(Block, '\\w+$'),
partial.address = str_remove(Block, type))
data
df <- structure(list(Block = c("45xx N Locust Grove St", "65xx Hawthorne Ave"
)), class = "data.frame", row.names = c(NA, -2L))
Upvotes: 2