Reputation: 4623
I have a dataset like this
data = pd.DataFrame({ 'a' : [5, 5, '2 bad']})
I want to convert this to
{ 'a.digits' : [5, 5, 2], 'a.text' : [nan, nan, 'bad']}
I can get 'a.digits' as bellow
data['a.digits'] = data['a'].replace('[^0-9]', '', regex = True)
5 2
2 1
Name: a, dtype: int64
When i do
data['a'] = data['a'].replace('[^\D]', '', regex = True)
or
data['a'] = data['a'].replace('[^a-zA-Z]', '', regex = True)
i get
5 2
bad 1
Name: a, dtype: int64
What's wrong? How to remove digits?
Upvotes: 0
Views: 87
Reputation: 137
Assuming there is a space between 2 and the word bad, you can do this:
data['Text'] = data['a'].str.split(' ').str[1]
Upvotes: 0
Reputation: 16619
Something like this would suffice?
In [8]: import numpy as np
In [9]: import re
In [10]: data['a.digits'] = data['a'].apply(lambda x: int(re.sub(r'[\D]', '', str(x))))
In [12]: data['a.text'] = data['a'].apply(lambda x: re.sub(r'[\d]', '', str(x)))
In [13]: data.replace('', np.nan, regex=True)
Out[13]:
a a.digits a.text
0 5 5 NaN
1 5 5 NaN
2 2 bad 2 bad
Upvotes: 2