Research100
Research100

Reputation: 57

Filter a Pandas DataFrame Through a Dictionary

I have a pandas dataframe with a number of columns. (Please see the table below.) Each column belongs to a group categorization G1-G4. This mapping of groups is in a data dictionary. If I want to filter the columns in the pandas dataframe based on the grouping data dictionary, for example to filter out all G3 and produce the resulting table below, what is the most efficient way of doing this?

enter image description here

Thank you.

Upvotes: 0

Views: 3537

Answers (2)

cfort
cfort

Reputation: 2776

I'm late to the party, but here's an another way to do this:

import pandas as pd
import numpy as np

filter_dict = {'T1': 'G1', 'T2': 'G3', 'T3': 'G3', 'T4': 'G1', 'T5': 'G2'}

#some play data
dta = np.random.rand(10,5)
rows = pd.date_range('1-Jan-2001', periods=10)
cols = ['T'+ str(x) for x in range(1,6)]
df = pd.DataFrame(dta, rows, cols)

'''
                  T1        T2        T3        T4        T5
2001-01-01  0.981817  0.590287  0.421105  0.060628  0.007345
2001-01-02  0.643111  0.813921  0.301809  0.414912  0.116028
2001-01-03  0.366478  0.582521  0.997483  0.642552  0.662618
2001-01-04  0.485125  0.062310  0.278239  0.838958  0.174067
2001-01-05  0.953075  0.612431  0.204880  0.258736  0.551282
2001-01-06  0.633442  0.852040  0.338404  0.137305  0.114481
2001-01-07  0.828730  0.477185  0.456451  0.560955  0.713365
2001-01-08  0.860316  0.809366  0.656306  0.728917  0.383045
2001-01-09  0.492210  0.639744  0.511768  0.623203  0.495040
2001-01-10  0.956224  0.188009  0.938048  0.999993  0.556607
'''

#convert wide to long cleanup the index and col names
df1 = df.stack()\
        .reset_index()\
        .rename(columns={'level_0':'date','level_1':'ticker', 0:'val'})\
        .reset_index('date')

#map ticker to group       
df1['grp'] = df1.ticker.map(filter_dict) 

#select stuff not in group 3 and pivot to wide format
df1[df1.grp != 'G3'].pivot('date','ticker','val')

'''
ticker            T1        T4        T5
date                                    
2001-01-01  0.981817  0.060628  0.007345
2001-01-02  0.643111  0.414912  0.116028
2001-01-03  0.366478  0.642552  0.662618
2001-01-04  0.485125  0.838958  0.174067
2001-01-05  0.953075  0.258736  0.551282
2001-01-06  0.633442  0.137305  0.114481
2001-01-07  0.828730  0.560955  0.713365
2001-01-08  0.860316  0.728917  0.383045
2001-01-09  0.492210  0.623203  0.495040
2001-01-10  0.956224  0.999993  0.556607
'''

Upvotes: 1

LateCoder
LateCoder

Reputation: 2283

Say the DataFrame is:

In [7]: df = pd.DataFrame({'T1': [1], 'T2': [2], 'T3': [3], 'T4': [4], 'T5': [5]})

In [8]: df
Out[8]:
   T1  T2  T3  T4  T5
0   1   2   3   4   5

And say the filtering dict is:

In [9]: filter_dict = {'T1': 'G1', 'T2': 'G3', 'T3': 'G3', 'T4': 'G1', 'T5': 'G2'}

You can select all columns except those belonging to G3 like this:

In [6]: df[[col_name for col_name, group in filter_dict.items() if group != 'G3']]
Out[6]:
   T5  T4  T1
0   5   4   1

Upvotes: 2

Related Questions