Reputation: 151
Assume we have the following dataframe:
d = {'col1': ['a1', 'b1', 'c1', 'a1'], 'col2': ['a2', 'b2', 'b2', 'c2'], 'year':[2011, 2011, 2012, 2012], 'rank':[1, 2, 1, 2]}
df = pd.DataFrame(data=d).set_index(['year', 'rank']).sort_index()
col1 col2
year rank
2011 1 a1 a2
2 b1 b2
2012 1 c1 b2
2 a1 c2
How can I select all columns where col1 != 'a1'
or year != 2011
?
If year
wouldn't be an index, I could do this by
df[(df.col1 != 'a1') | (df.year != 2011)]
However, as year is an index, df.year
would throw an AttributeError.
How can I formulate the condition for the index? Thanks in advance!
Upvotes: 2
Views: 4500
Reputation: 5757
You can try
df1 = df[df.index.get_level_values('year').isin([2011])]
df2 = df[df.col1 == 'a1']
result = pd.concat([df1,df2]).drop_duplicates()
Output
col1 col2
year rank
2011 1 a1 a2
2 b1 b2
2012 2 a1 c2
Upvotes: 0
Reputation: 1696
You can access the index by the method df.index.get_level_values
, e.g. you can gain the searched result by
In [29]: df[(df.col1 != 'a1') | (df.index.get_level_values('year') != 2011)]
Out[29]:
col1 col2
year rank
2011 2 b1 b2
2012 1 c1 b2
2 a1 c2
Some Sidenote:
The comparison df.index.get_level_values('year') != 2011
will be an numpy array, therefore we need to get the values from the pd.Series
for comparing with df.col1 != 'a1'
(in some older pandas versions you may have used to acess the values with .values
or similar, because comparing of an series with index with some array was not possible. Howewer, at least with 0.24 and above this is not necessary anymore).
Upvotes: 3
Reputation: 17911
You can use the method query()
that treats both the index and columns of the frame as a column:
df.query("col1 != 'a1' | year != 2011")
Output:
col1 col2
year rank
2011 2 b1 b2
2012 1 c1 b2
2 a1 c2
Upvotes: 3
Reputation: 917
You can access the index through loc
and iloc
operators.
df[df['col1'] != 'a1'].loc[2011]
To access both the year and rank index together, df.loc[2011,1]
, which will output a1 and a2
Upvotes: 0