Partial string substitution in pandas series

Question

I have am trying to do a string replacement in a pandas dataframe. Need to loop over individual columns, so its basically a replacement in a series:

In [105]: df = pd.DataFrame([['0 - abc', 1, 5], ['0 - abc - xyz', 2, 3]], columns=['col1','col2','col3'])

In [106]: df
Out[106]:
            col1  col2  col3
0        0 - abc     1     5
1  0 - abc - xyz     2     3

In [107]: for col in df.columns:
     ...:     df[col] = df[col].replace(to_replace='".*"|^0', value=df['col3'], inplace=False, regex=True)
     ...:

In [108]: df
Out[108]:
   col1  col2  col3
0     5     1     5
1     3     2     3

Instead of the above df, I am expecting result as:

In [110]: df_result
Out[110]:
            col1  col2  col3
0        5 - abc     1     5
1  3 - abc - xyz     2     3

That is, in '0 - abc', only the '0' in the beginning should get replaced with '5' and not the entire string.

What am I missing in my regex? Is there an alternate way to accomplish this kind of string substitution in pandas? Thanks.

cs95 · Accepted Answer

Converting df['col3'] to str using .astype fixes your problem:

In [836]: df.iloc[:, 0].replace('^0', df['col3'].astype(str), regex=True)
Out[836]: 
0          5 - abc
1    3 - abc - xyz
Name: col1, dtype: object

I've simplified your regex as well, although I'm not 100% certain it'll fit all your use cases:

^0

This will only match a leading zero and substitute that. You can incorporate this into your code as needed.

Partial string substitution in pandas series

Answers (1)

Related Questions