running man
running man

Reputation: 1467

pandas DataFrame replace() by using regex

I'm having a pandas DataFrame df. I want to replace (a space after ↑) with +, and (a space after ↓) with -. For example, df.a[0](values ↑ 0.69%) replace with +0.69%.

df['last_month'] = df['last_month'].replace(r"↑ ","")is not right. Why?

data = [{"a":"↑ 0.69%","b":"↓ 9.93%"},{"a":"↓ 0.46%","b":"↑ 3.3%"},{"a":"↓ 0.78%","b":"↓ 3.43%"}]
df = pd.DataFrame(data)
df

    a         b
0   ↑ 0.69%   ↓ 9.93%
1   ↓ 0.46%   ↑ 3.3%
2   ↓ 0.78%   ↓ 3.43%

In my raw data, is an unicode, so it didn't work. In the demo data, is a str(bytes), so df['last_month'] = df['last_month'].replace(r"↑ ","") works actually like MaxU's. But how to replace when DataFrame values are unicode?

Upvotes: 1

Views: 2427

Answers (3)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

IIUC:

In [28]: df.replace(['↑\s*', '↓\s*'], ['+', '-'], regex=True)
Out[28]:
        a       b
0  +0.69%  -9.93%
1  -0.46%   +3.3%
2  -0.78%  -3.43%

For Python 2x:

In [80]: %paste
data = [{"a":u"↑ 0.69%","b":u"↓ 9.93%"},{"a":u"↓ 0.46%","b":u"↑ 3.3%"},{"a":u"↓ 0.78%","b":u"↓ 3.43%"}]
df = pd.DataFrame(data)
df
## -- End pasted text --
Out[80]:
         a        b
0  ↑ 0.69%  ↓ 9.93%
1  ↓ 0.46%   ↑ 3.3%
2  ↓ 0.78%  ↓ 3.43%

In [81]: %paste
df = df.replace([u'↑\s*', u'↓\s*'], [u'+', u'-'], regex=True)
print(df)
## -- End pasted text --
        a       b
0  +0.69%  -9.93%
1  -0.46%   +3.3%
2  -0.78%  -3.43%

Upvotes: 4

running man
running man

Reputation: 1467

I got it, df.replace([u'↑ ', u'↓ '], [u'+', u'-'], regex=True) works.

Upvotes: 0

piRSquared
piRSquared

Reputation: 294218

you can stack then unstack with the str accessor.

df.stack().str.replace("↑ ","+").str.replace("↓ ", "-").unstack()

enter image description here

Upvotes: 2

Related Questions