Mark K
Mark K

Reputation: 9348

Python using regular expression to pick texts

I am currently working with pickups from strings.

A string contains 3 rows, 1st row people’s name (always), 2nd row date and time (fix format), 3rd row a note (could start with numbers and letters). And there are blank rows in between.

What I want are only the names. So I am thinking to use regular expression to locate the date and time, then pick the content before them. However the length of months here (i.e. March, June, February etc) are of different length.

sample 1:

Mike Alley

26 February 2005 12:12 AM

50 grams of tobacco



sample 2:

Pichy Lop Annz

22 June 2001 02:06 PM

Lighter and cigar
...
...
...

What would be the best way to achieve the goal?

Upvotes: 0

Views: 56

Answers (2)

Avinash Raj
Avinash Raj

Reputation: 174706

You could try the below.

>>> s = '''
Mike Alley

26 February 2005 12:12 AM

50 grams of tobacco



sample 2:

Pichy Lop Annz

22 June 2001 02:06 PM

Lighter and cigar
...'''
>>> re.findall(r'(?m)^(\S.*\S)\s*\n\s*\d{1,2}\s+\S+\s+\d{4}\s+\d{1,2}:\d{1,2}\s+[AP]M', s)
['Mike Alley', 'Pichy Lop Annz']

Upvotes: 1

hwnd
hwnd

Reputation: 70732

If the string is always in this format, you could simply use the following:

s.splitlines()[0]

If it's possible you may have blank lines before the line containing the name:

s.strip().splitlines()[0]

Upvotes: 3

Related Questions