user14487166
user14487166

Reputation:

Why does reindexing a pandas DataFrame give me an empty DataFrame?

I have a dataset with information on cities in the United States and I want to give it a two-level index with the state and the city. I've been trying to use the MultiIndex approach in the documentation that goes something like this.

lists = [list(df['state'],list(df['city'])]
tuples = list(zip(*lists))
index = pd.MultiIndex.from_tuples(tuples)
new_df = pd.DataFrame(df,index=index)

The output is a new DataFrame with the correct index but it's full of np.nan values. Any idea what's going on?

Upvotes: 1

Views: 2255

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 30991

When you reindex a DataFrame with a new index, Pandas operates roughly the following way:

  • Iterates over the current index.
  • Checks whether this index value occurs in the new index.
  • From the "old" (existing) rows, leaves only those with index values present in the new index.
  • There can be reordering of rows, to align with the order of the new index.
  • If the new index contains values absent in the DataFrame, then the coresponding row has only NaN values.

Maybe your DataFrame has initially a "standard" index (a sequence of integers starting from 0)? In this case no item of the old index is present in the new index (actualy MultiIndex), so the resulting DataFrame has all rows full of NaNs.

Maybe you should set the index to the two columns of interest, i.e. run:

df.set_index(['state', 'city'], inplace=True)

Upvotes: 1

Related Questions