Reputation: 69
For example I have such a data frame
import pandas as pd
nums = {'amount': ['0324','S123','0010', None, '0030', 'SA40', 'SA24']}
df = pd.DataFrame(nums)
And I need to remove all leading zeroes and replace NONEs with zeros:
I did it with cycles but for large frames it works not fast enough. I'd like to rewrite it using vectores
Upvotes: 1
Views: 8978
Reputation: 8816
I see already nice answer from @Epsi95 though, you even can try with character set with regex
>>> df['amount'].str.replace(r'^[0]*', '', regex=True).fillna('0')
0 324
1 S123
2 10
3 0
4 30
5 SA40
6 SA24
^[0]*
^ asserts position at start of a line
Match a single character present in the list below [0]
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Upvotes: 3
Reputation: 23217
0
or the last 0
is not removed, you can use:df['amount'] = df['amount'].str.replace(r'^(0+)(?!$)', '', regex=True).fillna('0')
Regex (?!$)
ensure the matching substring (leading zeroes) does not including the last 0
. Thus, effectively keeping the last 0
.
Input Data
nums = {'amount': ['0324','S123','0010', None, '0030', 'SA40', 'SA24', '0', '000']}
df = pd.DataFrame(nums)
amount
0 0324
1 S123
2 0010
3 None
4 0030
5 SA40
6 SA24
7 0 <== Added a single 0 here
8 000 <== Added a sequence of all 0's here
Output
print(df)
amount
0 324
1 S123
2 10
3 0
4 30
5 SA40
6 SA24
7 0 <== Single 0 is not removed
8 0 <== Last 0 is kept
Upvotes: 0
Reputation: 60
Step by step :
Remove all leading zeros:
Use str.lstrip
which returns a copy of the string with leading characters removed (based on the string argument passed).
Here,
df['amount'] = df['amount'].str.lstrip('0')
For more, (https://www.programiz.com/python-programming/methods/string/lstrip)
Replace None with zeros:
Use fill.na
which works with others than None
as well
Here,
df['amount'].fillna(value='0')
And for more : https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html
Result in one line:
df['amount'] = df['amount'].str.lstrip('0').fillna(value='0')
Upvotes: 0
Reputation: 9047
you can try str.replace
df['amount'].str.replace(r'^(0+)', '').fillna('0')
0 324
1 S123
2 10
3 0
4 30
5 SA40
6 SA24
Name: amount, dtype: object
Upvotes: 4