Reputation: 9348
I am currently working with pickups from strings.
A string contains 3 rows, 1st row people’s name (always), 2nd row date and time (fix format), 3rd row a note (could start with numbers and letters). And there are blank rows in between.
What I want are only the names. So I am thinking to use regular expression to locate the date and time, then pick the content before them. However the length of months here (i.e. March, June, February etc) are of different length.
sample 1:
Mike Alley
26 February 2005 12:12 AM
50 grams of tobacco
sample 2:
Pichy Lop Annz
22 June 2001 02:06 PM
Lighter and cigar
...
...
...
What would be the best way to achieve the goal?
Upvotes: 0
Views: 56
Reputation: 174706
You could try the below.
>>> s = '''
Mike Alley
26 February 2005 12:12 AM
50 grams of tobacco
sample 2:
Pichy Lop Annz
22 June 2001 02:06 PM
Lighter and cigar
...'''
>>> re.findall(r'(?m)^(\S.*\S)\s*\n\s*\d{1,2}\s+\S+\s+\d{4}\s+\d{1,2}:\d{1,2}\s+[AP]M', s)
['Mike Alley', 'Pichy Lop Annz']
Upvotes: 1
Reputation: 70732
If the string is always in this format, you could simply use the following:
s.splitlines()[0]
If it's possible you may have blank lines before the line containing the name:
s.strip().splitlines()[0]
Upvotes: 3