Reputation: 35
I have list of files,need to extract year from filenames:
File Names are :
[1] "2014_by_country_and_type_Enlarged_Europe.xlsx"
[2] "20140211_02_2012_vo_By_Country_Enlarged_Europe.xls"
[3] "20150219_2013_vo_By_Country_Enlarged_Europe.xlsx"
Query:
regmatches(files, regexpr("[0-9].*[0-9]", files))
But it results:
[1] "2014"
[2] "20140211_02_2012"
[3]"20150219_2013"
I need output as:
2014
2012
2013
Upvotes: 1
Views: 582
Reputation: 33488
Simple regex with gsub():
gsub(".*(\\d{4})_.+", "\\1", str)
[1] "2014" "2012" "2013"
It matches any 4 digit number that is followed by a _
.
Upvotes: 1
Reputation: 11128
You may try this:
regmatches(x, regexpr("(\\d{4})(?=_([a-zA-Z]+))",x, perl=T))
Assumption: selected numbers as year which are followed by an underscore then alphabets.
Positive lookahead here works as digits_of_year(?= underscore_with_alphabets
) matches a digits_of_year that is followed by a underscore_with_alphabets, without making the underscore_with_alphabets part of the match.
Output:
[1] "2014" "2012" "2013"
data:
x <- c("2014_by_country_and_type_Enlarged_Europe.xlsx", "20140211_02_2012_vo_By_Country_Enlarged_Europe.xls",
"20150219_2013_vo_By_Country_Enlarged_Europe.xlsx")
Upvotes: 1