Reputation: 305
so I am trying to filter a pandas data frame by values in a row. Basically I have a df where one row contains the name of building, eg. Education, K-12, office, church, etc..
I want to filter a new dataframe based on these values. Eg. I want to 'extract' the columns where the cell value equals 'Education, K-12'. How do I do this?
I searched far and wide but most chain filtering seemed based upon column value. This is not supposed to be based on column value.
Thanks!
SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.2 SAN ANTONIO, TX.3 \
0 Commercial Commercial Commercial Commercial
1 Fossil Fuel Fossil Fuel Fossil Fuel Fossil Fuel
2 Education, K-12 Education, K-12 Education, K-12 Education, K-12
.. ... ... ... ...
SAN ANTONIO, TX.429 SAN ANTONIO, TX.430 SAN ANTONIO, TX.431
0 Commercial Commercial Commercial
1 Electric Electric Electric
2 Office, Large Office, Large Office, Large
.. ... ... ...
[745 rows x 432 columns]>
Upvotes: 1
Views: 175
Reputation: 143098
My first idea is to transpose dataframe
transposed = dt.T
to get `Education, K-12 in column
0 1 2
SAN ANTONIO, TX Commercial Fossil Fuel Education, K-12
SAN ANTONIO, TX.1 Commercial Fossil Fuel Education, K-12
SAN ANTONIO, TX.2 Commercial Fossil Fuel Office, Large
SAN ANTONIO, TX.3 Commercial Fossil Fuel Education, K-12
and then search in rows
transposed[ transposed[2] == 'Education, K-12' ].index
Minimal workig example.
I use io.StringIO
only to simulate file in memory but you should use normal filename.
text = '''SAN ANTONIO, TX;SAN ANTONIO, TX.1;SAN ANTONIO, TX.2;SAN ANTONIO, TX.3
Commercial;Commercial;Commercial;Commercial
Fossil Fuel;Fossil Fuel;Fossil Fuel;Fossil Fuel
Education, K-12;Education, K-12;Office, Large;Education, K-12'''
import io
import pandas as pd
df = pd.read_csv(io.StringIO(text), sep=';')
print('\n--- df ---\n')
print(df)
transposed = df.T
print('\n--- transposed ---\n')
print(transposed)
print('\n--- names ---\n')
cols = transposed[ transposed[2] == 'Education, K-12' ].index
print(cols)
print('\n--- columns ---\n')
print(df[ cols ])
Result
--- df ---
SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.2 SAN ANTONIO, TX.3
0 Commercial Commercial Commercial Commercial
1 Fossil Fuel Fossil Fuel Fossil Fuel Fossil Fuel
2 Education, K-12 Education, K-12 Office, Large Education, K-12
--- transposed ---
0 1 2
SAN ANTONIO, TX Commercial Fossil Fuel Education, K-12
SAN ANTONIO, TX.1 Commercial Fossil Fuel Education, K-12
SAN ANTONIO, TX.2 Commercial Fossil Fuel Office, Large
SAN ANTONIO, TX.3 Commercial Fossil Fuel Education, K-12
--- names ---
Index(['SAN ANTONIO, TX', 'SAN ANTONIO, TX.1', 'SAN ANTONIO, TX.3'], dtype='object')
--- columns ---
SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.3
0 Commercial Commercial Commercial
1 Fossil Fuel Fossil Fuel Fossil Fuel
2 Education, K-12 Education, K-12 Education, K-12
Upvotes: 1
Reputation: 143098
After testing ideas I created this
cols = df.columns[ df.iloc[2] == 'Education, K-12' ]
df[ cols ]
I get only one row iloc[2]
so I get Series
and I can compare values in Series
with 'Education, K-12'
- this gives True/False
value for every item in this row and I can use it to filter columns.
Minimal working example.
I use io.StringIO
only to simulate file in memory but you should use normal filename.
text = '''SAN ANTONIO, TX;SAN ANTONIO, TX.1;SAN ANTONIO, TX.2;SAN ANTONIO, TX.3
Commercial;Commercial;Commercial;Commercial
Fossil Fuel;Fossil Fuel;Fossil Fuel;Fossil Fuel
Education, K-12;Education, K-12;Office, Large;Education, K-12'''
import io
import pandas as pd
df = pd.read_csv(io.StringIO(text), sep=';')
print('\n--- df ---\n')
print(df)
print('\n--- Series ---\n')
print( df.iloc[2] )
print('\n--- mask ---\n')
print( df.iloc[2] == 'Education, K-12' )
print('\n--- names ---\n')
cols = df.columns[ df.iloc[2] == 'Education, K-12' ]
print(cols)
print('\n--- columns ---\n')
print(df[ cols ])
Result:
--- df ---
SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.2 SAN ANTONIO, TX.3
0 Commercial Commercial Commercial Commercial
1 Fossil Fuel Fossil Fuel Fossil Fuel Fossil Fuel
2 Education, K-12 Education, K-12 Office, Large Education, K-12
--- Series ---
SAN ANTONIO, TX Education, K-12
SAN ANTONIO, TX.1 Education, K-12
SAN ANTONIO, TX.2 Office, Large
SAN ANTONIO, TX.3 Education, K-12
Name: 2, dtype: object
--- mask ---
SAN ANTONIO, TX True
SAN ANTONIO, TX.1 True
SAN ANTONIO, TX.2 False
SAN ANTONIO, TX.3 True
Name: 2, dtype: bool
--- names ---
Index(['SAN ANTONIO, TX', 'SAN ANTONIO, TX.1', 'SAN ANTONIO, TX.3'], dtype='object')
--- columns ---
SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.3
0 Commercial Commercial Commercial
1 Fossil Fuel Fossil Fuel Fossil Fuel
2 Education, K-12 Education, K-12 Education, K-12
Upvotes: 1
Reputation: 61
I've never seen this use case before. Can't think of an elegant way to do it, but you could first transpose the dataframe, then select just the rows you want, and then transpose it back.
In the example below I made row 7 the one that has the thing you want to filter on. So let's say you want to get rid of columns that have a 'c' in row 7. So basically we need to get rid of 'col2'.
>>> col1=['a','a','a','a','a','b','b','b']
>>> col2=['a','a','a','a','a','b','b','c']
>>> cols=['col1','col2']
>>> values=zip(col1,col2)
>>> import pandas as pd
>>> df=pd.DataFrame(data=values,columns=cols)
>>> df
col1 col2
0 a a
1 a a
2 a a
3 a a
4 a a
5 b b
6 b b
7 b c
>>> dft=df.T
>>> dft
0 1 2 3 4 5 6 7
col1 a a a a a b b b
col2 a a a a a b b c
>>> dff=dft[dft[7]!='c']
>>> dff
0 1 2 3 4 5 6 7
col1 a a a a a b b b
>>> dfo=dff.T
>>> dfo
col1
0 a
1 a
2 a
3 a
4 a
5 b
6 b
7 b
Upvotes: 1