Regular expression to clean up names

Question

I have two dataframes of names. The dataframe is longer, but I am using top3 as examples.

First list name examples: 
JOSEPH W. JOHN
MIMI N. ALFORD
WANG E. Li

Second list name examples:
AAMIR, DENNIS M
MAHAMMED, LINDA X
ABAD, FARLEY J

I need to extract the first name in those two dfs, how can I extract them in one regular expression.

The return should be 
list 1
JOSHEPH 
MIMI
WANT

list 2
DNNIES
LINDA
FARLEY

My current code looks as re.search(r'(?<=,)\w+', df['name']), but it didn't return the right name. Is it possible to write two regular expression code in Python to extract those names?

alani · Accepted Answer

It appears that what you want to look for here is the first sequence of word characters that does not have a comma anywhere after it on the line, rather than one that does have a comma before it. So instead of your positive look-behind assertion, it seems that you will want a negative look-ahead assertion.

Try using as your regex:

r'\w+(?!.*,)'

Apply this using:

df['name'].apply(lambda name:re.search(r'\w+(?!.*,)',name).group())

Applying the above to this example dataframe:

                name   foo
0     JOSEPH W. JOHN     1
1     MIMI N. ALFORD     3
2         WANG E. Li     3
3    AAMIR, DENNIS M     3
4  MAHAMMED, LINDA X     3
5     ABAD, FARLEY J     3

gives:

0    JOSEPH
1      MIMI
2      WANG
3    DENNIS
4     LINDA
5    FARLEY

Regular expression to clean up names

Answers (2)

Related Questions