statBeginner
statBeginner

Reputation: 849

Pandas error while finding unique values in a column with name changed

I am working on the Titanic survival dataset. After I read the data, I change one of the column names and then try to work with it. The changed column name, however, is reflected in the column names but not available for a specific purpose as shown below.

import pandas as pd
titanic = pd.read_excel("titanic.xls", "titanic")
print(titanic.columns.values)

which gives me:

['pclass' 'survived' 'name' 'sex' 'age' 'sibsp' 'parch' 'ticket' 'fare'
 'cabin' 'embarked' 'boat' 'body' 'home.dest']

Now, I change one of the column names:

titanic.columns.values[-1] = 'home'
print(titanic.columns.values)

where the output reflects the changed name:

['pclass' 'survived' 'name' 'sex' 'age' 'sibsp' 'parch' 'ticket' 'fare'
 'cabin' 'embarked' 'boat' 'body' 'home']

Now, if I try to print unique values from the columns,

print(pd.unique(titanic.name))

I get the desired output:

['Allen, Miss. Elisabeth Walton' ... ]

but here,

print(pd.unique(titanic.home))

I get,

AttributeError: 'DataFrame' object has no attribute 'home'

Upvotes: 2

Views: 1599

Answers (1)

economy
economy

Reputation: 4251

This is a chained assignment issue. When you assign a value to the columns list, it's operating on a copy of the dataframe, and not on the dataframe itself.

The fix is described in the documentation for Pandas, and usually requires saving the copy of the dataframe when you make changes to it.

Using the suggested method, this is how it works:

newCols = titanic.columns.values
newCols[-1] = 'home'
titanic.columns = newCols

A copy of the names is first saved, edited and then directly assigned to the columns.

Upvotes: 1

Related Questions