gernworm
gernworm

Reputation: 340

Select values from R dataframe column

I'm working on a class project using a Chicago crime data set and R. One of the attributes in the data set is Block which contains partial addresses where the incident occurred. For example:

+--------------------------+
|           Block          |
+--------------------------+
|  45xx N Locust Grove St  |
|   65xx Hawthorne Ave     |
+--------------------------+

The values in Block vary in length but I am wanting to create a new variable with the street type, St, Ave, Blvd, etc. I have tried using the separate function from tidyr.

df <- df %>%
   separate(Block, into = c("partial.address, "type"),
           sep = " ", extra = "merge", fill = "left")

However, this returns the number, 45xx, as the partial.address value and the remaining value is in type. How can I select the street type from the address?

I'm hoping to get something like this as output:

+--------------------------+-------------+
|     partial.address      |     type    |
+--------------------------+-------------+
|  45xx N Locust Grove     |      St     |
|   65xx Hawthorne         |     Ave     |
+--------------------------+-------------+

Upvotes: 1

Views: 192

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388807

You can use extract :

tidyr::extract(df, Block, c("partial.address", "type"), "(.*)(St|Ave)")

#      partial.address  type
#1 45xx N Locust Grove    St
#2      65xx Hawthorne   Ave

Or using stringr :

library(dplyr)
library(stringr)

df %>%
  mutate(type = str_extract(Block, '(St|Ave)'), 
         partial.address = str_remove(Block, type))

You can include more patterns in (St|Ave) if you have more.


If we want to capture the last word of each Block we can use :

df %>%
  mutate(type = str_extract(Block, '\\w+$'), 
         partial.address = str_remove(Block, type))

data

df <- structure(list(Block = c("45xx N Locust Grove St", "65xx Hawthorne Ave"
)), class = "data.frame", row.names = c(NA, -2L))

Upvotes: 2

Related Questions