Regular expressions from a list previously specified

Question

I am trying the following: from each article print the month only which is located in either the 4th or the 5th line. The way I am attempting to do so is by:

m = 'January', 'February', 'March', 'April', 'May' 'June', 'July', 'August', 'September', 'October', 'Novemeber', 'December'

for i in range(len(sections)):

        date = re.search(r"[m]",sections[i][1:5])

        print(date)

First problem. I do not know how to search for a regular expression in my list "m". Second problem, I want to focus my search only in lines 0-5 of each article.

dawg · Accepted Answer

Given:

>>> txt='''\
... Line 1
... Line 2
... Line 3
... Line 4
... Line 5 April'''

You can get the i through j line with .splitlines()[i:j]:

>>> txt.splitlines()[0:3]
['Line 1', 'Line 2', 'Line 3']

Now just construct a pattern that finds the months. Be sure to use \b to find whole word matches:

>>> months=['January', 'February', 'March', 'April', 'May' 'June', 'July', 'August', 'September', 'October', 'Novemeber', 'December']
>>> pat=re.compile("|".join([r"\b{}\b".format(m) for m in months]), re.M)

Then search with your pattern in the slice of target lines:

>>> pat.search("
".join(txt.splitlines()[0:5]))
<_sre.SRE_Match object at 0x107a2a9f0>

If you want to capture the line it appears on, you might do something like THIS

Regular expressions from a list previously specified

Answers (2)

Related Questions