Reputation: 675
I have am trying to do a string replacement in a pandas dataframe. Need to loop over individual columns, so its basically a replacement in a series:
In [105]: df = pd.DataFrame([['0 - abc', 1, 5], ['0 - abc - xyz', 2, 3]], columns=['col1','col2','col3'])
In [106]: df
Out[106]:
col1 col2 col3
0 0 - abc 1 5
1 0 - abc - xyz 2 3
In [107]: for col in df.columns:
...: df[col] = df[col].replace(to_replace='".*"|^0', value=df['col3'], inplace=False, regex=True)
...:
In [108]: df
Out[108]:
col1 col2 col3
0 5 1 5
1 3 2 3
Instead of the above df, I am expecting result as:
In [110]: df_result
Out[110]:
col1 col2 col3
0 5 - abc 1 5
1 3 - abc - xyz 2 3
That is, in '0 - abc', only the '0' in the beginning should get replaced with '5' and not the entire string.
What am I missing in my regex? Is there an alternate way to accomplish this kind of string substitution in pandas? Thanks.
Upvotes: 1
Views: 352
Reputation: 403278
Converting df['col3']
to str
using .astype
fixes your problem:
In [836]: df.iloc[:, 0].replace('^0', df['col3'].astype(str), regex=True)
Out[836]:
0 5 - abc
1 3 - abc - xyz
Name: col1, dtype: object
I've simplified your regex as well, although I'm not 100% certain it'll fit all your use cases:
^0
This will only match a leading zero and substitute that. You can incorporate this into your code as needed.
Upvotes: 1