python pandas upper() not work for string columns

Question

Hi I'm working with the Kaggle Titanic data. I use apply(lambda x: x.upper()) to work on multiple columns, but it doesn't work.

I put the data at my google drive and you can download here.

I test on each column, which is all object type (I think it means str, correct me if it's wrong please). But some columns report 'float' object has no attribute 'upper'

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

train = pd.read_csv('train.csv', header=0)

train.ix[:,['Name', 'Sex', 'Ticket', 'Cabin', 'Embarked']].dtypes
# Name        object
# Sex         object
# Ticket      object
# Cabin       object
# Embarked    object
# dtype: object

train.ix[:,['Name', 'Sex', 'Ticket', 'Cabin', 'Embarked']].apply(lambda x: x.upper()) 
# not work

# try each column
train.ix[:,'Name'].apply(lambda x: x.upper()) # works
train.ix[:,'Sex'].apply(lambda x: x.upper()) # works
train.ix[:,'Ticket'].apply(lambda x: x.upper()) # works
train.ix[:,'Cabin'].apply(lambda x: x.upper()) # AttributeError: 'float' object has no attribute 'upper'
train.ix[:,'Embarked'].apply(lambda x: x.upper()) # AttributeError: 'float' object has no attribute 'upper'

Any help's appreciated. thanks!

Anton Protopopov · Accepted Answer

It's because your columns Cabin and Embarked contain NaN values which have dtype np.float. You could check it with casting type for your apply:

In [355]: train.Cabin.apply(lambda x: type(x))[:10]
Out[355]:
0    
1      
2    
3      
4    
5    
6      
7    
8    
9    
Name: Cabin, dtype: object

So you could use str.upper which handle NaN by default. Or you could fill your NaN values to empty string '' with fillna which has upper method and then use your `lambda function:

In [363]: train.Cabin.fillna('').apply(lambda x: x.upper)[:5]
Out[363]:
0
1     C85
2
3    C123
4
Name: Cabin, dtype: object

In [365]: train.Cabin.str.upper()[:5]
Out[365]:
0     NaN
1     C85
2     NaN
3    C123
4     NaN
Name: Cabin, dtype: object

Or if you'd like to save NaN as sting you could fillna with NaN string:

In [369]: train.Cabin.fillna('NaN').apply(lambda x: x.upper())[:5]
Out[369]:
0     NAN
1     C85
2     NAN
3    C123
4     NAN
Name: Cabin, dtype: object

python pandas upper() not work for string columns

Answers (2)

Related Questions