Babypopo
Babypopo

Reputation: 151

Pandas: Selecting rows by condition on column and index

Assume we have the following dataframe:

d = {'col1': ['a1', 'b1', 'c1', 'a1'], 'col2': ['a2', 'b2', 'b2', 'c2'], 'year':[2011, 2011, 2012, 2012], 'rank':[1, 2, 1, 2]}
df = pd.DataFrame(data=d).set_index(['year', 'rank']).sort_index()

          col1 col2
year rank          
2011 1      a1   a2
     2      b1   b2
2012 1      c1   b2
     2      a1   c2

How can I select all columns where col1 != 'a1' or year != 2011?

If year wouldn't be an index, I could do this by

df[(df.col1 != 'a1') | (df.year != 2011)]

However, as year is an index, df.year would throw an AttributeError.

How can I formulate the condition for the index? Thanks in advance!

Upvotes: 2

Views: 4500

Answers (4)

abhilb
abhilb

Reputation: 5757

You can try

df1 = df[df.index.get_level_values('year').isin([2011])]
df2 = df[df.col1 == 'a1']
result = pd.concat([df1,df2]).drop_duplicates()

Output

        col1    col2
year    rank        
2011    1   a1  a2
        2   b1  b2
2012    2   a1  c2

Upvotes: 0

Nico Albers
Nico Albers

Reputation: 1696

You can access the index by the method df.index.get_level_values, e.g. you can gain the searched result by

In [29]: df[(df.col1 != 'a1') | (df.index.get_level_values('year') != 2011)]
Out[29]:
          col1 col2
year rank
2011 2      b1   b2
2012 1      c1   b2
     2      a1   c2

Some Sidenote:

The comparison df.index.get_level_values('year') != 2011 will be an numpy array, therefore we need to get the values from the pd.Series for comparing with df.col1 != 'a1' (in some older pandas versions you may have used to acess the values with .values or similar, because comparing of an series with index with some array was not possible. Howewer, at least with 0.24 and above this is not necessary anymore).

Upvotes: 3

Mykola Zotko
Mykola Zotko

Reputation: 17911

You can use the method query() that treats both the index and columns of the frame as a column:

df.query("col1 != 'a1' | year != 2011")

Output:

          col1 col2
year rank          
2011 2      b1   b2
2012 1      c1   b2
     2      a1   c2

Upvotes: 3

Raunaq Jain
Raunaq Jain

Reputation: 917

You can access the index through loc and iloc operators.

df[df['col1'] != 'a1'].loc[2011]

To access both the year and rank index together, df.loc[2011,1], which will output a1 and a2

Upvotes: 0

Related Questions