Reputation: 7790
I have the following Data frame:
import pandas as pd
rep = pd.DataFrame.from_items([('Probe', ['x', 'y', 'z']), ('Gene', ['foo', 'bar', 'qux']), ('Probe',['x','y','z']), ("RP",[1.00,2.33,4.5])], orient='columns')
Which produces:
In [11]: rep
Out[11]:
Probe Gene Probe RP
0 x foo x 1.00
1 y bar y 2.33
2 z qux z 4.50
Note that there are duplicate column there. What I want to do is to select the row based on a list:
ls = ["x", "z", "i"]
Yielding this:
Probe Gene Probe RP
0 x foo x 1.00
2 z qux z 4.50
Note that we'd like to preserve the columns based on the original DF above.
Why this failed?
In [9]: rep[rep[[0]].isin(ls)]
ValueError: cannot reindex from a duplicate axis
What's the right way to do it? Any alternative to isin
?
Upvotes: 1
Views: 76
Reputation: 540
You should probably mention if the list in question ls
contains values belonging to a fixed column, say, Probe
in this case. If that is the case then the following works.
rep.ix[rep.Probe.isin(ls).ix[:,1]]
Upvotes: 1
Reputation: 375455
You should use iloc here:
In [11]: rep.iloc[rep.iloc[0].isin(ls).values]
Out[11]:
Probe Gene Probe RP
0 x foo x 1.0
2 z qux z 4.5
This first creates the boolean vector (as a one-dimensional array rather than a DataFrame), and you can use this as a mask:
In [12]: rep.iloc[0].isin(ls).values
Out[12]: array([ True, False, True, False], dtype=bool)
Upvotes: 1