sree
sree

Reputation: 35

how to extract year from filenames in R

I have list of files,need to extract year from filenames:

File Names are :

[1] "2014_by_country_and_type_Enlarged_Europe.xlsx"            
[2] "20140211_02_2012_vo_By_Country_Enlarged_Europe.xls"       
[3] "20150219_2013_vo_By_Country_Enlarged_Europe.xlsx"

Query:

 regmatches(files, regexpr("[0-9].*[0-9]", files))

But it results:

 [1] "2014"            
 [2] "20140211_02_2012" 
 [3]"20150219_2013"   

I need output as:

 2014
 2012
 2013

Upvotes: 1

Views: 582

Answers (2)

s_baldur
s_baldur

Reputation: 33488

Simple regex with gsub():

gsub(".*(\\d{4})_.+", "\\1", str)
[1] "2014" "2012" "2013"

It matches any 4 digit number that is followed by a _.

Upvotes: 1

PKumar
PKumar

Reputation: 11128

You may try this:

regmatches(x, regexpr("(\\d{4})(?=_([a-zA-Z]+))",x, perl=T))

Assumption: selected numbers as year which are followed by an underscore then alphabets.

Positive lookahead here works as digits_of_year(?= underscore_with_alphabets) matches a digits_of_year that is followed by a underscore_with_alphabets, without making the underscore_with_alphabets part of the match.

Output:

[1] "2014" "2012" "2013"

data:

x <- c("2014_by_country_and_type_Enlarged_Europe.xlsx", "20140211_02_2012_vo_By_Country_Enlarged_Europe.xls", 
"20150219_2013_vo_By_Country_Enlarged_Europe.xlsx")

Upvotes: 1

Related Questions