pdubois
pdubois

Reputation: 7790

Selecting rows - based on a list - from a DF with duplicated columns

I have the following Data frame:

import pandas as pd
rep = pd.DataFrame.from_items([('Probe', ['x', 'y', 'z']), ('Gene', ['foo', 'bar', 'qux']), ('Probe',['x','y','z']), ("RP",[1.00,2.33,4.5])], orient='columns')

Which produces:

In [11]: rep
Out[11]:
  Probe Gene Probe    RP
0     x  foo     x  1.00
1     y  bar     y  2.33
2     z  qux     z  4.50

Note that there are duplicate column there. What I want to do is to select the row based on a list:

ls = ["x", "z", "i"]

Yielding this:

  Probe Gene Probe    RP
0     x  foo     x  1.00
2     z  qux     z  4.50

Note that we'd like to preserve the columns based on the original DF above.

Why this failed?

In [9]: rep[rep[[0]].isin(ls)]
ValueError: cannot reindex from a duplicate axis

What's the right way to do it? Any alternative to isin?

Upvotes: 1

Views: 76

Answers (2)

awhan
awhan

Reputation: 540

You should probably mention if the list in question ls contains values belonging to a fixed column, say, Probe in this case. If that is the case then the following works.

rep.ix[rep.Probe.isin(ls).ix[:,1]]

Upvotes: 1

Andy Hayden
Andy Hayden

Reputation: 375455

You should use iloc here:

In [11]: rep.iloc[rep.iloc[0].isin(ls).values]
Out[11]:
  Probe Gene Probe   RP
0     x  foo     x  1.0
2     z  qux     z  4.5

This first creates the boolean vector (as a one-dimensional array rather than a DataFrame), and you can use this as a mask:

In [12]: rep.iloc[0].isin(ls).values
Out[12]: array([ True, False,  True, False], dtype=bool)

Upvotes: 1

Related Questions