Reputation: 57
I have a pandas dataframe with a number of columns. (Please see the table below.) Each column belongs to a group categorization G1-G4. This mapping of groups is in a data dictionary. If I want to filter the columns in the pandas dataframe based on the grouping data dictionary, for example to filter out all G3 and produce the resulting table below, what is the most efficient way of doing this?
Thank you.
Upvotes: 0
Views: 3537
Reputation: 2776
I'm late to the party, but here's an another way to do this:
import pandas as pd
import numpy as np
filter_dict = {'T1': 'G1', 'T2': 'G3', 'T3': 'G3', 'T4': 'G1', 'T5': 'G2'}
#some play data
dta = np.random.rand(10,5)
rows = pd.date_range('1-Jan-2001', periods=10)
cols = ['T'+ str(x) for x in range(1,6)]
df = pd.DataFrame(dta, rows, cols)
'''
T1 T2 T3 T4 T5
2001-01-01 0.981817 0.590287 0.421105 0.060628 0.007345
2001-01-02 0.643111 0.813921 0.301809 0.414912 0.116028
2001-01-03 0.366478 0.582521 0.997483 0.642552 0.662618
2001-01-04 0.485125 0.062310 0.278239 0.838958 0.174067
2001-01-05 0.953075 0.612431 0.204880 0.258736 0.551282
2001-01-06 0.633442 0.852040 0.338404 0.137305 0.114481
2001-01-07 0.828730 0.477185 0.456451 0.560955 0.713365
2001-01-08 0.860316 0.809366 0.656306 0.728917 0.383045
2001-01-09 0.492210 0.639744 0.511768 0.623203 0.495040
2001-01-10 0.956224 0.188009 0.938048 0.999993 0.556607
'''
#convert wide to long cleanup the index and col names
df1 = df.stack()\
.reset_index()\
.rename(columns={'level_0':'date','level_1':'ticker', 0:'val'})\
.reset_index('date')
#map ticker to group
df1['grp'] = df1.ticker.map(filter_dict)
#select stuff not in group 3 and pivot to wide format
df1[df1.grp != 'G3'].pivot('date','ticker','val')
'''
ticker T1 T4 T5
date
2001-01-01 0.981817 0.060628 0.007345
2001-01-02 0.643111 0.414912 0.116028
2001-01-03 0.366478 0.642552 0.662618
2001-01-04 0.485125 0.838958 0.174067
2001-01-05 0.953075 0.258736 0.551282
2001-01-06 0.633442 0.137305 0.114481
2001-01-07 0.828730 0.560955 0.713365
2001-01-08 0.860316 0.728917 0.383045
2001-01-09 0.492210 0.623203 0.495040
2001-01-10 0.956224 0.999993 0.556607
'''
Upvotes: 1
Reputation: 2283
Say the DataFrame is:
In [7]: df = pd.DataFrame({'T1': [1], 'T2': [2], 'T3': [3], 'T4': [4], 'T5': [5]})
In [8]: df
Out[8]:
T1 T2 T3 T4 T5
0 1 2 3 4 5
And say the filtering dict is:
In [9]: filter_dict = {'T1': 'G1', 'T2': 'G3', 'T3': 'G3', 'T4': 'G1', 'T5': 'G2'}
You can select all columns except those belonging to G3 like this:
In [6]: df[[col_name for col_name, group in filter_dict.items() if group != 'G3']]
Out[6]:
T5 T4 T1
0 5 4 1
Upvotes: 2