Helena
Helena

Reputation: 87

R removing selected characters from a string

Sorry in case of duplication, but the solutions I have seen does not solve my issue.

I have a data frame (df). One of its variables (df$Year) includes a list of years, such as:

 > df$Year

 Year
 2001–                       
 2013–                     
 2016–                      
 2003–                      
 2012–2013                      
 2013–                      
 1993–2007, 2010–

In case of multiple years, I just want to keep the last one (i.e. rather than '1993–2007, 2010–' only '2010') and get rid of the '-'. Yet, I have tried with:

unlist(str_extract_all(df$Year, "[[:digit:]]4$"))

but this does not seem to work.

Any hint?

Upvotes: 1

Views: 34

Answers (1)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520968

We can use sub for a one liner:

df$Year <- sub(".*(\\d{4})\\–?", "\\1", df$Year)
df$Year

[1] "2001" "2013" "2016" "2003" "2013" "2013" "2010"

Demo

Note that the dashes you use in your year ranges appear to be em dashes (or maybe en dashes), not the regular ASCII character.

Upvotes: 2

Related Questions