Reputation: 1642
I am trying the following: from each article print the month only which is located in either the 4th or the 5th line. The way I am attempting to do so is by:
m = 'January', 'February', 'March', 'April', 'May' 'June', 'July', 'August', 'September', 'October', 'Novemeber', 'December'
for i in range(len(sections)):
date = re.search(r"[m]",sections[i][1:5])
print(date)
First problem. I do not know how to search for a regular expression in my list "m". Second problem, I want to focus my search only in lines 0-5 of each article.
Upvotes: 0
Views: 51
Reputation: 104102
Given:
>>> txt='''\
... Line 1
... Line 2
... Line 3
... Line 4
... Line 5 April'''
You can get the i
through j
line with .splitlines()[i:j]
:
>>> txt.splitlines()[0:3]
['Line 1', 'Line 2', 'Line 3']
Now just construct a pattern that finds the months. Be sure to use \b
to find whole word matches:
>>> months=['January', 'February', 'March', 'April', 'May' 'June', 'July', 'August', 'September', 'October', 'Novemeber', 'December']
>>> pat=re.compile("|".join([r"\b{}\b".format(m) for m in months]), re.M)
Then search with your pattern in the slice of target lines:
>>> pat.search("\n".join(txt.splitlines()[0:5]))
<_sre.SRE_Match object at 0x107a2a9f0>
If you want to capture the line it appears on, you might do something like THIS
Upvotes: 2
Reputation: 4837
It depends on what sections is, i assume it's a multiline string:
import re
sections = 'some sections here'
dates = re.findall('\\b'+'\\b|\\b'.join(m), ' '.join(sections.splitlines()[0:4]))
Upvotes: 1