gunjanpatait
gunjanpatait

Reputation: 91

Transform the values of the features of a dataframe

I want to impute following transformations in the values:

  1. The 'Name' column to show only the titles (for ex:Miss,Mr).
  2. The 'Cabin' column to contain only the 1st letter (for ex:'C' instead of the whole 'C54'.

Please help me with a general solution lastly for such similar problems. Thank you.(This was in a jupyter notebook and I didn't know to properly present the code)

categoric.head()
output:
    Name                                                Cabin   
0   Braund, Mr. Owen Harris                             A23 
1   Cumings, Mrs. John Bradley (Florence Briggs Th...   C85 
2   Heikkinen, Miss. Laina                              C54
3   Futrelle, Mrs. Jacques Heath (Lily May Peel)        C123    
4   Allen, Mr. William Henry                            B231

Upvotes: 0

Views: 36

Answers (2)

WhiteHat
WhiteHat

Reputation: 120

categoric.Name= categoric.Name.apply(lambda x: x.split(', ')[1].split('.')[0])
categoric.Cabin = categoric.Cabin.slice(0,1)

Upvotes: 1

ALollz
ALollz

Reputation: 59519

pandas has an entire set of methods related to String Handling for Series.

The cabins requires you to slice the first letter:

categoric.Cabin.str[0]

#0    A
#1    C
#2    C
#3    C
#4    B

To get the titles, you can use .str.extract, with a capturing group with all different values separated by the vertical bar. Since . has a special meaning in patterns, need to escape it by preceding it with \:

categoric.Name.str.extract('(Mr\.|Mrs\.|Miss\.)')

#       0
#0    Mr.
#1   Mrs.
#2  Miss.
#3   Mrs.
#4    Mr.

Upvotes: 1

Related Questions