Reputation: 740
using Pandas to remove all but last period in a string like so:
s = pd.Series(['1.234.5','123.5','2.345.6','678.9'])
counts = s.str.count('\.')
target = counts==2
target
0 True
1 False
2 True
3 False
dtype: bool
s = s[target].str.replace('\.','',1)
s
0 1234.5
2 2345.6
dtype: object
my desired output, however, is:
0 1234.5
1 123.5
2 2345.6
3 678.9
dtype: object
The replace command along with the mask target seem to be dropping the unreplaced values and I can't see how to remedy this.
Upvotes: 9
Views: 3128
Reputation: 402593
str.replace
This regex pattern with str.replace
should do nicely.
s.str.replace(r'\.(?=.*?\.)', '')
0 1234.5
1 123.5
2 2345.6
3 678.9
dtype: object
The idea is that, as long as there are more characters to replace, keep replacing. Here's a breakdown of the regular expression used.
\. # '.'
(?= # positive lookahead
.*? # match anything
\. # look for '.'
)
np.vectorize
If you want to do this using count
, it isn't impossible, but it is a challenge. You can make this easier with np.vectorize
. First, define a function,
def foo(r, c):
return r.replace('.', '', c)
Vectorize it,
v = np.vectorize(foo)
Now, call the function v
, passing s
and the counts to replace.
pd.Series(v(s, s.str.count(r'\.') - 1))
0 1234.5
1 123.5
2 2345.6
3 678.9
dtype: object
Keep in mind that this is basically a glorified loop.
The python equivalent of vectorize
would be,
r = []
for x, y in zip(s, s.str.count(r'\.') - 1):
r.append(x.replace('.', '', y))
pd.Series(r)
0 1234.5
1 123.5
2 2345.6
3 678.9
dtype: object
Or, using a list comprehension:
pd.Series([x.replace('.', '', y) for x, y in zip(s, s.str.count(r'\.') - 1)])
0 1234.5
1 123.5
2 2345.6
3 678.9
dtype: object
Upvotes: 9
Reputation: 12610
You want to replace the masked items and keep the rest untouched. Thats exactly what Series.where
does, except it replaces the unmasked values so you need to negate the mask.
s.where(~target, s.str.replace('\.','',1))
Or you can make the changes in-place by assigning the masked values, this is probably cheaper but destructive.
s[target] = s[target].str.replace('\.','',1)
Upvotes: 0