dataframe breakdown by year

Question

I have a dataset on county executives and their year of inaguration. I need break down which year each executive was inaugurated.

The problem is that the notation under the "year" variable is inconsistent.

For instance, let's say I start with this:

df <- data.frame(year= c(2000, "from 2001 to 2002", "01-feb-2003", 2000, "01-jan-2002", "from 2004 to 2005"),
                  executive.name= c("Johnson", "Smith", "Alleghany", "Roberts", "Clarke", "Tollson"),
                  district= rep(c(1001, 1002), each=3))

I want it to look like this

df.neat <- data.frame(year= c(2000, 2001, 2003, 2000, 2002, 2004),
                  executive.name= c("Johnson", "Smith", "Alleghany", "Roberts", "Clarke", "Tollson"),
                  district= rep(c(1001, 1002), each=3))

Note how the innaguration cycle does not always align (2000, 2001, and 2003 for district 1001 and 2000, 2002, and 2004 for district 1002).

LMc · Accepted Answer

library(dplyr)
library(stringr)

df |>
  mutate(year = as.numeric(str_extract(year, "\d{4}")))
#   year executive.name district
# 1 2000        Johnson     1001
# 2 2001          Smith     1001
# 3 2003      Alleghany     1001
# 4 2000        Roberts     1002
# 5 2002         Clarke     1002
# 6 2004        Tollson     1002

dataframe breakdown by year

Answers (2)

Related Questions