Find month and year inside string

Question

I have a column of strings that have months and years spread throughout its entries:

df <- data.frame(STRINGS = c("January 2017 Blah Blah",
                         "February Blah Blah",
                         "2016 Yeah Yeah",
                         "March Bleck",
                         "Stuff"))

> df
                 STRINGS
1 January 2017 Blah Blah
2     February Blah Blah
3         2016 Yeah Yeah
4            March Bleck
5                  Stuff

All years range from 2015 to 2017.

I would like to output the following:

                 STRINGS           MONTH         YEAR
1 January 2017 Blah Blah         January         2017
2     February Blah Blah        February           NA
3         2016 Yeah Yeah              NA         2016
4            March Bleck           March           NA
5                  Stuff              NA           NA

What is the easiest way to do this?

To start, I have

months <- c("January", "February", "March", "April", "May", "June",
              "July", "August", "September", "October", "November", "December")
years <- c(2015, 2016, 2017)

www · Accepted Answer

A solution using dplyr, rebus, and stringr. Notice that it assumes only 1 matching month and year per row.

library(dplyr)
library(rebus)
library(stringr)

df2 <- df %>%
  mutate(STRINGS = as.character(STRINGS)) %>%
  mutate(MONTH = str_extract(STRINGS, or1(months)),
         YEAR = str_extract(STRINGS, or1(years)))
df2
                 STRINGS    MONTH YEAR
1 January 2017 Blah Blah  January 2017
2     February Blah Blah February 
3         2016 Yeah Yeah      2016
4            March Bleck    March 
5                  Stuff

Find month and year inside string

Answers (1)

Related Questions