Reputation:
I have a dataset with information on cities in the United States and I want to give it a two-level index with the state and the city. I've been trying to use the MultiIndex approach in the documentation that goes something like this.
lists = [list(df['state'],list(df['city'])]
tuples = list(zip(*lists))
index = pd.MultiIndex.from_tuples(tuples)
new_df = pd.DataFrame(df,index=index)
The output is a new DataFrame with the correct index but it's full of np.nan
values. Any idea what's going on?
Upvotes: 1
Views: 2255
Reputation: 30991
When you reindex a DataFrame with a new index, Pandas operates roughly the following way:
Maybe your DataFrame has initially a "standard" index (a sequence of integers starting from 0)? In this case no item of the old index is present in the new index (actualy MultiIndex), so the resulting DataFrame has all rows full of NaNs.
Maybe you should set the index to the two columns of interest, i.e. run:
df.set_index(['state', 'city'], inplace=True)
Upvotes: 1