dvasques
dvasques

Reputation: 13

How to regex the beginning and the end of a sentence - python

I have a list of strings containing dates, country, and city:

myList = ["(1922, May, 22; USA; CHICAGO)","(1934, June, 15; USA; BOSTON)"]

I want to extract only the date and the city (cities are always with capital letters). So far I have this:

for info in myList:

        pattern_i = re.compile(r"[^;]+")
        pattern_f = re.compile(r";\s\b([A-Z]+)\)")

        mi = re.match(pattern_i, info)
        mf = re.match(pattern_f, info)

        print(mi)
        print(mf)

I am getting:

<re.Match object; span=(0, 14), match='(1922, May, 22'>
None
<re.Match object; span=(0, 15), match='(1934, June, 15'>
None

I've tried so many things and can't seem to find a solution. What am I missing here?

Upvotes: 0

Views: 879

Answers (4)

Code Maniac
Code Maniac

Reputation: 37755

thanks! But I am still curious, why am I getting None for mf?

Python offers two different primitive operations based on regular expressions: re.match() checks for a match only at the beginning of the string, while re.search() checks for a match anywhere in the string (this is what Perl does by default). Ref DOcs


re.match searches for match at the beginning of string, since the pattern you're trying to match isn't at the start of string so you're getting None you can use re.search is one option to find match value anywhere in the string


As i suggested split is a better option here, you should split by ; and take the first and last element to get the desired output

Upvotes: 0

CMMCD
CMMCD

Reputation: 360

Regex is overkill for data with simple, consistent formatting. This can be done easily using the built in string manipulation functions.

for entry in myList:
    date, country, city = [x.strip() for x in entry[1:-1].split(';')]

# Explanation
entry[1:-1] # Strip off the parenthesis
entry[1:-1].split(';') # Split into a list of strings using the ';' character
x.strip() # Strip extra whitespace

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150735

You can use pandas:

p='\((?P<date>.*);.*;(?P<city>.*)\)'

pd.Series(myList).str.extract(p)

Output:

             date      city
0   1922, May, 22   CHICAGO
1  1934, June, 15    BOSTON

Upvotes: 0

Jacek Rojek
Jacek Rojek

Reputation: 1122

regex for date: ^\(([^;]+)

regex for city ([A-Z]+)\)$

Upvotes: 0

Related Questions