Reputation: 8849
How to apply a regex to a data frame column?
import pandas as pd
df = pd.DataFrame({'col1': ['negative', 'positive', 'neutral', 'neutral', 'positive']})
cdict = {'n.*': -1, 'p.*': 0}
df['col2'] = df['col1'].map(cdict)
print(df.head())
Current output is:
: col1 col2
: 0 negative NaN
: 1 positive NaN
: 2 neutral NaN
: 3 neutral NaN
: 4 positive NaN
But expected results:
: col1 col2
: 0 negative -1
: 1 positive 1
: 2 neutral -1
: 3 neutral -1
: 4 positive 1
Upvotes: 2
Views: 67
Reputation: 34076
To be honest, you don't need to have a dict
for this at all. You can save on some space there.
Use numpy.select
with Series.str.startswith
:
In [1927]: import numpy as np
In [1928]: conds = [df.col1.str.startswith('n'), df.col1.str.startswith('p')]
In [1929]: choices = [-1, 0]
In [1930]: df['col2'] = np.select(conds, choices)
In [1931]: df
Out[1931]:
col1 col2
0 negative -1
1 positive 0
2 neutral -1
3 neutral -1
4 positive 0
Upvotes: 2
Reputation: 75080
Instead of using a series.map
use series.replace
with regex=True
df['col2'] = df['col1'].replace(cdict,regex=True)
Upvotes: 4