Debosmit Ray
Debosmit Ray

Reputation: 5413

Extract date string from (more) complex string (possibly a regex match)

I have a string template that looks like 'my_index-{year}'.
I do something like string_template.format(year=year) where year is some string. Result of this is some string that looks like my_index-2011.

Now. to my question. I have a string like my_index-2011 and my template 'my_index-{year}' What might be a slick way to extract the {year} portion?

[Note: I know of the existence of parse library]

Upvotes: 2

Views: 151

Answers (5)

James K. Lowden
James K. Lowden

Reputation: 7837

Yes, a regex would be helpful here.

In [1]: import re
In [2]: s = 'my_string-2014'
In [3]: print( re.search('\d{4}', s).group(0) )
2014

Edit: I should have mentioned your regex can be more sophisticated. You can haul out a subcomponent of a more specific string, for example:

In [4]: print( re.search('my_string-(\d{4})$', s).group(1) )
2014

Given the problem you presented, I think any "find the year" formula should be expressible in terms of a regular expression.

Upvotes: 2

alecxe
alecxe

Reputation: 473893

There is this module called parse which provides an opposite to format() functionality:

Parse strings using a specification based on the Python format() syntax.

>>> from parse import parse
>>> s = "my_index-2011"
>>> f = "my_index-{year}"
>>> parse(f, s)['year']
'2011'

And, an alternative option and, since you are extracting a year, would be to use the dateutil parser in a fuzzy mode:

>>> from dateutil.parser import parse
>>> parse("my_index-2011", fuzzy=True).year
2011

Upvotes: 2

galaxyan
galaxyan

Reputation: 6121

I assume "year" is 4 digits and you have multiple indexes

import re
res = ''
patterns = [ '%s-[0-9]{4}'%index for index in idx ] 
for index,pattern in zip(idx,patterns):
    res +=' '.join( re.findall(pattern ,data) ).replace(index+'-','') + ' '

---update---

dummyString = 'adsf-1234 fsfdr lkjdfaif ln ewr-1234 adsferggs sfdgrsfgadsf-3456'
dummyIdx = ['ewr','adsf']

output

1234 1234 3456 

Upvotes: 2

Ryan Soklaski
Ryan Soklaski

Reputation: 550

You are going to want to use the string method split to split on "-", and then catch the last element as your year:

year = "any_index-2016".split("-")[-1]

Because you caught the last element (using -1 as the index), your index can have hyphens in them, and you will still extract the year appropriately.

Upvotes: 1

John Gordon
John Gordon

Reputation: 33335

Use the split() string function to split the string into two parts around the dash, then grab just the second part.

mystring = "my_index-2011"
year = mystring.split("-")[1]

Upvotes: 2

Related Questions