Reputation: 166
In Pandas, I am trying to filter out rows with specific dates (set as first level of a multiindex) in a dataframe.
Once filtered, I'd like to check whether the last index value for the first level matches with my latest date. However, I can't get Pandas to return the right value.
An example may be helpful. I first create the original df with multiindex:
index = pd.date_range('2016-01-01', freq='B', periods=10), ["AAPL", "GOOG"]
df = pd.DataFrame(index=pd.MultiIndex.from_product(index))
print df
Then I filter out specific dates:
start, end = df.index.levels[0][1], df.index.levels[0][-4]
print start, end
Now, I create my filtered df only including dates from start till end:
df2 = df.loc[start:end]
df2
This looks fine, as expected. "01/12/2016" is my last index date.
Then, when I check the last index value for first level(0), it returns "01/14/16" instead of my chosen end date ("01/12/2016").
print df2.index.levels[0][-1]
How can I get the last date from df2? Am I missing something or is this a bug?
Upvotes: 3
Views: 751
Reputation: 2382
The reason for the behaviour you are seeing is that slicing a pandas.DataFrame does not slice the index, and this is intentional. To get the behaviour you want, you can use the remove_unused_levels() function, newly introduced in Pandas 0.20.0:
# Update index to remove values that are not used
df2.index = df2.index.remove_unused_levels()
Once you do this, the following two lines give the same output:
# Print the last value in index
print df2.index.levels[0][-1]
# Print the last value in the slice
print end
To explain a bit more, df2.index.levels[0]
gives you the distinct index values that are actually used. As IanS pointed out, if you want the part of the index that is actually being used (as opposed to the distinct values), then you can use df2.index.get_level_values(0)
. In the above example, that would give each date twice, since each was used once for each of 'AAPL' and 'GOOG'. Taking the final value (via -1) of either of these gives the same value.
Upvotes: 3
Reputation: 16251
Look at df2.index
, it is not what you think. It contains the information necessary to reconstruct the multi-index, that's all.
If you want to access index values, use get_level_values
:
df2.index.get_level_values(0)
Then df2.index.get_level_values(0)[-1]
should return what you expected.
Upvotes: 1