Reputation: 4204
Hi I'm working with the Kaggle Titanic data. I use apply(lambda x: x.upper())
to work on multiple columns, but it doesn't work.
I put the data at my google drive and you can download here.
I test on each column, which is all object
type (I think it means str
, correct me if it's wrong please). But some columns report 'float' object has no attribute 'upper'
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
train = pd.read_csv('train.csv', header=0)
train.ix[:,['Name', 'Sex', 'Ticket', 'Cabin', 'Embarked']].dtypes
# Name object
# Sex object
# Ticket object
# Cabin object
# Embarked object
# dtype: object
train.ix[:,['Name', 'Sex', 'Ticket', 'Cabin', 'Embarked']].apply(lambda x: x.upper())
# not work
# try each column
train.ix[:,'Name'].apply(lambda x: x.upper()) # works
train.ix[:,'Sex'].apply(lambda x: x.upper()) # works
train.ix[:,'Ticket'].apply(lambda x: x.upper()) # works
train.ix[:,'Cabin'].apply(lambda x: x.upper()) # AttributeError: 'float' object has no attribute 'upper'
train.ix[:,'Embarked'].apply(lambda x: x.upper()) # AttributeError: 'float' object has no attribute 'upper'
Any help's appreciated. thanks!
Upvotes: 2
Views: 5394
Reputation: 31672
It's because your columns Cabin
and Embarked
contain NaN
values which have dtype np.float
. You could check it with casting type for your apply:
In [355]: train.Cabin.apply(lambda x: type(x))[:10]
Out[355]:
0 <class 'float'>
1 <class 'str'>
2 <class 'float'>
3 <class 'str'>
4 <class 'float'>
5 <class 'float'>
6 <class 'str'>
7 <class 'float'>
8 <class 'float'>
9 <class 'float'>
Name: Cabin, dtype: object
So you could use str.upper
which handle NaN
by default.
Or you could fill your NaN
values to empty string ''
with fillna
which has upper
method and then use your `lambda function:
In [363]: train.Cabin.fillna('').apply(lambda x: x.upper)[:5]
Out[363]:
0
1 C85
2
3 C123
4
Name: Cabin, dtype: object
In [365]: train.Cabin.str.upper()[:5]
Out[365]:
0 NaN
1 C85
2 NaN
3 C123
4 NaN
Name: Cabin, dtype: object
Or if you'd like to save NaN
as sting you could fillna with NaN
string:
In [369]: train.Cabin.fillna('NaN').apply(lambda x: x.upper())[:5]
Out[369]:
0 NAN
1 C85
2 NAN
3 C123
4 NAN
Name: Cabin, dtype: object
Upvotes: 5
Reputation: 4899
Missing values are present in those columns. These are represented by numpy.nan
which is a float. If you use .str.upper()
instead of .apply(lambda x: x.upper())
, that will recognize this fact and will not produce an error.
Upvotes: 1