Covich
Covich

Reputation: 2814

Pandas DataFrame : selection of multiple elements in several columns

I have this Python Pandas DataFrame DF :

DICT = {  'letter': ['A','B','C','A','B','C','A','B','C'],
          'number': [1,1,1,2,2,2,3,3,3],
          'word'  : ['one','two','three','three','two','one','two','one','three']}

DF = pd.DataFrame(DICT)

Which looks like :

  letter  number   word
0      A       1    one
1      B       1    two
2      C       1  three
3      A       2  three
4      B       2    two
5      C       2    one
6      A       3    two
7      B       3    one
8      C       3  three

And I want to extract the lines

  letter  number   word
       A       1    one
       B       2    two
       C       3  three

First I tired :

DF[(DF['letter'].isin(("A","B","C"))) & 
    DF['number'].isin((1,2,3))        &
    DF['word'].isin(('one','two','three'))]

Of course it didn't work, and everything has been selected

Then I tested :

Bool = DF[['letter','number','word']].isin(("A",1,"one"))
DF[np.all(Bool,axis=1)]

Good, it works ! but only for one line ... If we take the next step and give an iterable to .isin() :

Bool = DF[['letter','number','word']].isin((("A",1,"one"),
                                            ("B",2,"two"),
                                            ("C",3,"three")))

Then it fails, the Boolean array is full of False ...

What I'm doing wrong ? Is there a more elegant way to do this selection based on several columns ?

(Anyway, I want to avoid a for loop, because the real DataFrames I'm using are really big, so I'm looking for the fastest optimal way to do the job)

Upvotes: 1

Views: 1384

Answers (1)

jezrael
jezrael

Reputation: 863351

Idea is create new DataFrame with all triple values and then merge with original DataFrame:

L = [("A",1,"one"),
     ("B",2,"two"),
     ("C",3,"three")]

df1 = pd.DataFrame(L, columns=['letter','number','word'])
print (df1)
  letter  number   word
0      A       1    one
1      B       2    two
2      C       3  three

df = DF.merge(df1)
print (df)
  letter  number   word
0      A       1    one
1      B       2    two
2      C       3  three

Another idea is create list of tuples, convert to Series and then compare by isin:

s = pd.Series(list(map(tuple, DF[['letter','number','word']].values.tolist())),index=DF.index)
df1 = DF[s.isin(L)]
print (df1)
  letter  number   word
0      A       1    one
4      B       2    two
8      C       3  three

Upvotes: 1

Related Questions