seanysull
seanysull

Reputation: 740

Replace all but last occurrences of a character in a string with pandas

using Pandas to remove all but last period in a string like so:

s = pd.Series(['1.234.5','123.5','2.345.6','678.9'])
counts = s.str.count('\.')
target = counts==2
target
0     True
1    False
2     True
3    False
dtype: bool

s = s[target].str.replace('\.','',1)
s
0    1234.5
2    2345.6
dtype: object

my desired output, however, is:

0    1234.5
1    123.5
2    2345.6
3    678.9
dtype: object

The replace command along with the mask target seem to be dropping the unreplaced values and I can't see how to remedy this.

Upvotes: 9

Views: 3128

Answers (2)

cs95
cs95

Reputation: 402593

Regex-based with str.replace

This regex pattern with str.replace should do nicely.

s.str.replace(r'\.(?=.*?\.)', '')

0    1234.5
1     123.5
2    2345.6
3     678.9
dtype: object

The idea is that, as long as there are more characters to replace, keep replacing. Here's a breakdown of the regular expression used.

\.     # '.'
(?=    # positive lookahead
.*?    # match anything
\.     # look for '.'
)

Fun with np.vectorize

If you want to do this using count, it isn't impossible, but it is a challenge. You can make this easier with np.vectorize. First, define a function,

def foo(r, c):
    return r.replace('.', '', c)

Vectorize it,

v = np.vectorize(foo)

Now, call the function v, passing s and the counts to replace.

pd.Series(v(s, s.str.count(r'\.') - 1))

0    1234.5
1     123.5
2    2345.6
3     678.9
dtype: object

Keep in mind that this is basically a glorified loop.


Loopy/List Comprehension

The python equivalent of vectorize would be,

r = []
for x, y in zip(s, s.str.count(r'\.') - 1):
    r.append(x.replace('.', '', y))

pd.Series(r)

0    1234.5
1     123.5
2    2345.6
3     678.9
dtype: object

Or, using a list comprehension:

pd.Series([x.replace('.', '', y) for x, y in zip(s, s.str.count(r'\.') - 1)])

0    1234.5
1     123.5
2    2345.6
3     678.9
dtype: object

Upvotes: 9

Stop harming Monica
Stop harming Monica

Reputation: 12610

You want to replace the masked items and keep the rest untouched. Thats exactly what Series.where does, except it replaces the unmasked values so you need to negate the mask.

s.where(~target, s.str.replace('\.','',1))

Or you can make the changes in-place by assigning the masked values, this is probably cheaper but destructive.

s[target] = s[target].str.replace('\.','',1)

Upvotes: 0

Related Questions