Reputation: 75
I have the Titanic dataset, and I want to extract title from people's names using pandas.str.split
function.
>>> data.Title = data.Name.str.split('[,.]').str.get(1)
>>> data.Title
which result in the following, look just fine:
0 Mr
1 Mrs
2 Miss
3 Mrs
4 Mr
5 Mr
6 Mr
7 Master
8 Mrs
...
Name: Name, Length: 1309, dtype: object
it seems like each row has only on string which is Mr
or Mrs
or anything else. But if I index only one row, it shows this
>>> data.Name.str.split('[,.]').str.get(1)[0]
0 Mr
0 Mr
Name: Name, dtype: object
which I have no idea why is this happening, and I can't filter dataframe either:
data.Title == 'Mr'
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
...
Upvotes: 0
Views: 68
Reputation: 863361
data.Name.str.split('[,.]').str.get(1)[0]
means select all rows with index == 0
. If duplicated indices get more rows.
So is necessary create unique index:
data = data.reset_index(drop=True)
For second problem there are traling whitespaces, so is necessary remove them by strip
:
data.Title = data.Name.str.split('[,.]').str.get(1).str.strip()
All together:
data = data.reset_index(drop=True)
data.Title = data.Name.str.split('[,.]').str.get(1).str.strip()
Upvotes: 2