Reputation: 1187
I have a column of strings that have months and years spread throughout its entries:
df <- data.frame(STRINGS = c("January 2017 Blah Blah",
"February Blah Blah",
"2016 Yeah Yeah",
"March Bleck",
"Stuff"))
> df
STRINGS
1 January 2017 Blah Blah
2 February Blah Blah
3 2016 Yeah Yeah
4 March Bleck
5 Stuff
All years range from 2015 to 2017.
I would like to output the following:
STRINGS MONTH YEAR
1 January 2017 Blah Blah January 2017
2 February Blah Blah February NA
3 2016 Yeah Yeah NA 2016
4 March Bleck March NA
5 Stuff NA NA
What is the easiest way to do this?
To start, I have
months <- c("January", "February", "March", "April", "May", "June",
"July", "August", "September", "October", "November", "December")
years <- c(2015, 2016, 2017)
Upvotes: 0
Views: 162
Reputation: 39154
A solution using dplyr
, rebus
, and stringr
. Notice that it assumes only 1 matching month and year per row.
library(dplyr)
library(rebus)
library(stringr)
df2 <- df %>%
mutate(STRINGS = as.character(STRINGS)) %>%
mutate(MONTH = str_extract(STRINGS, or1(months)),
YEAR = str_extract(STRINGS, or1(years)))
df2
STRINGS MONTH YEAR
1 January 2017 Blah Blah January 2017
2 February Blah Blah February <NA>
3 2016 Yeah Yeah <NA> 2016
4 March Bleck March <NA>
5 Stuff <NA> <NA>
Upvotes: 3