CezarySzulc
CezarySzulc

Reputation: 2007

Python pandas -> select by condition in columns name

I have df with column names: 'a', 'b', 'c' ... 'z'.

print(my_df.columns)
Index(['a', 'b', 'c', ... 'y', 'z'],
  dtype='object', name=0)

I have function which determine which columns should be displayed. For example:

start = con_start()
stop = con_stop()
print(my_df.columns >= start) & (my_df <= stop)

My result is:

[False False ... False False False False  True  True
True  True False False]

My goal is display dataframe only with columns that satisfy my condition. If start = 'a' and stop = 'b', I want to have:

0                                      a              b         
index1       index2                                                  
New York     New York           0.000000       0.000000          
California   Los Angeles   207066.666667  214466.666667     
Illinois     Chicago       138400.000000  143633.333333     
Pennsylvania Philadelphia   53000.000000   53633.333333      
Arizona      Phoenix       111833.333333  114366.666667 

Upvotes: 6

Views: 22624

Answers (5)

Petr Matuska
Petr Matuska

Reputation: 573

If your conditions are on a similar level of complexity as you shown in your example there is no need to use any additional function but just do filtering e.g.

sweet_and_red_fruit = fruit[(fruit[sweet == 1) & (fruit["colour"] == "red")]
print(sweet_and_red_fruit)

OR if you want to just print

print(fruit[(fruit[sweet == 1) & (fruit["colour"] == "red")])

Upvotes: 0

piRSquared
piRSquared

Reputation: 294278

I want to make this robust and with as few assumptions as possible.

option 1
use iloc with array slicing
Assumptions:

  • my_df.columns.is_unique evaluates to True
  • columns are already in order

start = df.columns.get_loc(con_start())
stop = df.columns.get_loc(con_stop())

df.iloc[:, start:stop + 1]

option 2
use loc with boolean slicing
Assumptions:

  • column values are comparable

start = con_start()
stop = con_stop()

c = df.columns.values
m = (start <= c) & (stop >= c)

df.loc[:, m]

Upvotes: 4

Binyamin Even
Binyamin Even

Reputation: 3382

assuming result is your [true/false] array and that letters is [a...z]:

res=[letters[i] for i,r in enumerate(result) if r]
new_df=df[res]

Upvotes: 0

acidtobi
acidtobi

Reputation: 1365

Generate a list of colums to display:

cols = [x for x in my_df.columns if start <= x <= stop]

Use only these columns in your DataFrame:

my_df[cols]

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153460

You can use slicing to achieve this with .loc:

 df.loc[:,'a':'b']

Upvotes: 8

Related Questions