Richard
Richard

Reputation: 65550

Why does `set_index` create an index label for the column name?

I have a CSV file which begins like this:

Year,Boys,Girls
1996,333490,315995
1997,329577,313518
1998,325903,309998

When I read it into pandas and set an index, it isn't doing quite what I expect:

df = pd.read_csv('../data/myfile.csv')
df.set_index('Year', inplace=True)
df.head()

Why is there an index entry for the column label, with blank values next to it? Shouldn't this simply disappear?

enter image description here

Also, I'm not clear on how to retrieve the values for 1998. If I try df.loc['1998'] I get an error: KeyError: 'the label [1998] is not in the [index]'.

Upvotes: 4

Views: 1075

Answers (1)

mtoto
mtoto

Reputation: 24198

You should set the name attribute of your index to None:

df.index.names = [None]
df.head()
#       Boys    Girls
#1996   333490  315995
#1997   329577  313518
#1998   325903  309998

As for retrieving the data for 1998, simply lose the quotes:

df.loc[1998]
#Boys     325903
#Girls    309998
#Name: 1998, dtype: int64

Upvotes: 3

Related Questions