Reputation: 1876
I'm trying to extract the year from a string with this format:
dataset_name = 'ALTVALLEDAOSTA000020191001.json'
I tried:
dataset_name[/<\b(19|20)\d{2}\b>/, 1]
/\b(19|20)\d{2}\b/.match(dataset_name)
I'm still reading the docs but so far I'm not able to achieve the result I want. I'm really bad at regex.
Upvotes: 1
Views: 242
Reputation: 160551
There are many ways to get to Rome.
Starting with:
foo = 'ALTVALLEDAOSTA000020191001.json'
Stripping the extended filename + extension to its basename
then using a regex:
ymd = /(\d{4})(\d{2})(\d{2})$/
ext = File.extname(foo)
File.basename(foo, ext) # => "ALTVALLEDAOSTA000020191001"
File.basename(foo, ext)[ymd, 1] # => "2019"
File.basename(foo, ext)[ymd, 2] # => "10"
File.basename(foo, ext)[ymd, 3] # => "01"
Using a regex against the entire filename to grab just the year:
ymd = /^.*(\d{4})/
foo[ymd, 1] # => "1001"
or extracting the year, month and day:
ymd = /^.*(\d{4})(\d{2})(\d{2})/
foo[ymd, 1] # => "2019"
foo[ymd, 2] # => "10"
foo[ymd, 3] # => "01"
Using String's unpack
:
ymd = '@18A4'
foo.unpack(ymd) # => ["2019"]
or:
ymd = '@18A4A2A2'
foo.unpack(ymd) # => ["2019", "10", "01"]
If the strings are consistent length and format, then I'd work with unpack
, because, if I remember right, it is the fastest, followed by String slicing, with anchored, then unanchored regular expressions trailing.
Upvotes: 1
Reputation: 56865
Since your dataset name always ends in yyyymmdd.json
, you can take a slice of the last 13-9 characters counting from the rear:
irb(main):001:0> dataset_name = 'ALTVALLEDAOSTA000020191001.json'
irb(main):002:0> dataset_name[-13...-9]
=> "2019"
You can also use a regex if you want a bit more precision:
irb(main):003:0> dataset_name =~ /(\d{4})\d{4}\.json$/
=> 18
irb(main):004:0> $1
=> "2019"
Upvotes: 1