Reputation: 3893
I'm trying to slice a pandas dataframe indexed by a period index with a list of strings with unexpected results.
import pandas as pd
import numpy as np
idx = pd.period_range(1991,1993,freq='A')
df = pd.DataFrame(np.arange(9).reshape(3,3),index=idx)
print df.loc[['1991','1993'],:]
results in:
KeyError: "None of [['1991', '1993']] are in the [index]"
If the the last line is switched to:
print df.ix[['1991','1993'],:]
The output is
Out[128]:
0 1 2
1991 NaN NaN NaN
1993 NaN NaN NaN
If instead of a period index I have
idx = [str(year) for year in range(1991,1994)]
print df.loc[['1991','1993'],:]
Then the output is as expected:
Out[127]:
0 1 2
1991 0 1 2
1993 6 7 8
So my question is: how to slice a pandas dataframe with a period index?
Upvotes: 1
Views: 3507
Reputation: 879681
Pandas doesn't convert the strings into Periods for you, so you have to be more explicit. You could use:
In [38]: df.loc[[pd.Period('1991'), pd.Period('1993')], :]
Out[38]:
0 1 2
1991 0 1 2
1993 6 7 8
or
In [39]: df.loc[map(pd.Period, ['1991', '1993']), :]
Out[39]:
0 1 2
1991 0 1 2
1993 6 7 8
or
In [40]: df.loc[[idx[0],idx[-1]], :]
Out[40]:
0 1 2
1991 0 1 2
1993 6 7 8
By the way, when you pass an arbitrary list of items to df.loc
Pandas returns a new sub-DataFrame with a copy of values from df
. This is not a slice. To slice you would need to use the slicing notation: a:b
. For example,
In [64]: df.loc[pd.Period('1991'): pd.Period('1993'): 2, :]
Out[64]:
0 1 2
1991 0 1 2
1993 6 7 8
The distinction is important because in NumPy and Pandas slices return views while non-slice indexing return copies.
Upvotes: 3