Keep original string values after pandas str.extract() if the regex doesn't match

Question

My input data:

df=pd.DataFrame({'A':['adam','monica','joe doe','michael mo'], 'B':['david','valenti',np.nan,np.nan]})
print(df)
            A        B
0        adam    david
1      monica  valenti
2     joe doe      NaN
3  michael mo      NaN

I need to extract strings after space, to a second column, but when I use my code...:

df['B'] = df['A'].str.extract(r'( [a-zA-Z](.*))')
print(df)
            A     B
0        adam   NaN
1      monica   NaN
2     joe doe   doe
3  michael mo    mo

...I receive NaN in each cell where value has not been extracted. How to avoid it? I tried to extract only from rows where NaN exist using this code:

df.loc[df.B.isna(),'B'] = df.loc[df.B.isna(),'A'].str.extract(r'( [a-zA-Z](.*))')

ValueError: Incompatible indexer with DataFrame

Expected output:

            A     B
0        adam   david
1      monica   valenti
2     joe doe   doe
3  michael mo    mo

jezrael · Accepted Answer

I think solution should be simplify - split by spaces and get second lists and pass to Series.fillna function:

df['B'] = df['B'].fillna(df['A'].str.split().str[1])
print (df)
            A        B
0        adam    david
1      monica  valenti
2     joe doe      doe
3  michael mo       mo

Detail:

print (df['A'].str.split().str[1])
0    NaN
1    NaN
2    doe
3     mo
Name: A, dtype: object

Your solution should be changed:

df['B'] = df['A'].str.extract(r'( [a-zA-Z](.*))')[0].fillna(df.B)
print (df)
            A        B
0        adam    david
1      monica  valenti
2     joe doe      doe
3  michael mo       mo

Better solution wich changed regex and expand=False for Series:

df['B'] = df['A'].str.extract(r'( [a-zA-Z].*)', expand=False).fillna(df.B)
print (df)
            A        B
0        adam    david
1      monica  valenti
2     joe doe      doe
3  michael mo       mo

Detail:

print (df['A'].str.extract(r'( [a-zA-Z].*)', expand=False))
0     NaN
1     NaN
2     doe
3      mo
Name: A, dtype: object

EDIT:

For extract also values from first column simpliest is use:

df1 = df['A'].str.split(expand=True)

df['A'] = df1[0]
df['B'] = df['B'].fillna(df1[1])
print (df)
         A        B
0     adam    david
1   monica  valenti
2      joe      doe
3  michael       mo

Keep original string values after pandas str.extract() if the regex doesn't match

Answers (2)

Related Questions

Keep original string values after pandas str.extract() if the regex doesn&#39;t match

Answers (2)

Related Questions

Keep original string values after pandas str.extract() if the regex doesn't match