Tuomas Talvitie
Tuomas Talvitie

Reputation: 305

Pandas filter by values in a row?

so I am trying to filter a pandas data frame by values in a row. Basically I have a df where one row contains the name of building, eg. Education, K-12, office, church, etc..

I want to filter a new dataframe based on these values. Eg. I want to 'extract' the columns where the cell value equals 'Education, K-12'. How do I do this?

I searched far and wide but most chain filtering seemed based upon column value. This is not supposed to be based on column value.

Thanks!

          SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.2 SAN ANTONIO, TX.3  \
0         Commercial        Commercial        Commercial        Commercial   
1        Fossil Fuel       Fossil Fuel       Fossil Fuel       Fossil Fuel   
2    Education, K-12   Education, K-12   Education, K-12   Education, K-12   
..               ...               ...               ...               ...   
 

            SAN ANTONIO, TX.429  SAN ANTONIO, TX.430 SAN ANTONIO, TX.431  
0            Commercial          Commercial          Commercial  
1              Electric            Electric            Electric  
2         Office, Large       Office, Large       Office, Large  
..                  ...                 ...                 ...  


[745 rows x 432 columns]>

Upvotes: 1

Views: 175

Answers (3)

furas
furas

Reputation: 143098

My first idea is to transpose dataframe

transposed = dt.T

to get `Education, K-12 in column

                            0            1                2
SAN ANTONIO, TX    Commercial  Fossil Fuel  Education, K-12
SAN ANTONIO, TX.1  Commercial  Fossil Fuel  Education, K-12
SAN ANTONIO, TX.2  Commercial  Fossil Fuel    Office, Large
SAN ANTONIO, TX.3  Commercial  Fossil Fuel  Education, K-12

and then search in rows

transposed[ transposed[2] == 'Education, K-12' ].index

Minimal workig example.

I use io.StringIO only to simulate file in memory but you should use normal filename.

text = '''SAN ANTONIO, TX;SAN ANTONIO, TX.1;SAN ANTONIO, TX.2;SAN ANTONIO, TX.3
Commercial;Commercial;Commercial;Commercial
Fossil Fuel;Fossil Fuel;Fossil Fuel;Fossil Fuel
Education, K-12;Education, K-12;Office, Large;Education, K-12'''

import io
import pandas as pd

df = pd.read_csv(io.StringIO(text), sep=';')

print('\n--- df ---\n')
print(df)

transposed = df.T

print('\n--- transposed ---\n')
print(transposed)

print('\n--- names ---\n')
cols = transposed[ transposed[2] == 'Education, K-12' ].index 
print(cols)

print('\n--- columns ---\n')
print(df[ cols ])

Result

--- df ---

   SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.2 SAN ANTONIO, TX.3
0       Commercial        Commercial        Commercial        Commercial
1      Fossil Fuel       Fossil Fuel       Fossil Fuel       Fossil Fuel
2  Education, K-12   Education, K-12     Office, Large   Education, K-12

--- transposed ---
                            0            1                2
SAN ANTONIO, TX    Commercial  Fossil Fuel  Education, K-12
SAN ANTONIO, TX.1  Commercial  Fossil Fuel  Education, K-12
SAN ANTONIO, TX.2  Commercial  Fossil Fuel    Office, Large
SAN ANTONIO, TX.3  Commercial  Fossil Fuel  Education, K-12

--- names ---

Index(['SAN ANTONIO, TX', 'SAN ANTONIO, TX.1', 'SAN ANTONIO, TX.3'], dtype='object')

--- columns ---

   SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.3
0       Commercial        Commercial        Commercial
1      Fossil Fuel       Fossil Fuel       Fossil Fuel
2  Education, K-12   Education, K-12   Education, K-12

Upvotes: 1

furas
furas

Reputation: 143098

After testing ideas I created this

cols = df.columns[ df.iloc[2] == 'Education, K-12' ]

df[ cols ]

I get only one row iloc[2] so I get Series and I can compare values in Series with 'Education, K-12' - this gives True/False value for every item in this row and I can use it to filter columns.


Minimal working example.

I use io.StringIO only to simulate file in memory but you should use normal filename.

text = '''SAN ANTONIO, TX;SAN ANTONIO, TX.1;SAN ANTONIO, TX.2;SAN ANTONIO, TX.3
Commercial;Commercial;Commercial;Commercial
Fossil Fuel;Fossil Fuel;Fossil Fuel;Fossil Fuel
Education, K-12;Education, K-12;Office, Large;Education, K-12'''

import io
import pandas as pd

df = pd.read_csv(io.StringIO(text), sep=';')

print('\n--- df ---\n')
print(df)

print('\n--- Series ---\n')
print( df.iloc[2] )

print('\n--- mask ---\n')
print( df.iloc[2] == 'Education, K-12' )

print('\n--- names ---\n')
cols = df.columns[ df.iloc[2] == 'Education, K-12' ]
print(cols)

print('\n--- columns ---\n')
print(df[ cols ])

Result:

--- df ---

   SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.2 SAN ANTONIO, TX.3
0       Commercial        Commercial        Commercial        Commercial
1      Fossil Fuel       Fossil Fuel       Fossil Fuel       Fossil Fuel
2  Education, K-12   Education, K-12     Office, Large   Education, K-12

--- Series ---

SAN ANTONIO, TX      Education, K-12
SAN ANTONIO, TX.1    Education, K-12
SAN ANTONIO, TX.2      Office, Large
SAN ANTONIO, TX.3    Education, K-12
Name: 2, dtype: object

--- mask ---

SAN ANTONIO, TX       True
SAN ANTONIO, TX.1     True
SAN ANTONIO, TX.2    False
SAN ANTONIO, TX.3     True
Name: 2, dtype: bool

--- names ---

Index(['SAN ANTONIO, TX', 'SAN ANTONIO, TX.1', 'SAN ANTONIO, TX.3'], dtype='object')

--- columns ---

   SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.3
0       Commercial        Commercial        Commercial
1      Fossil Fuel       Fossil Fuel       Fossil Fuel
2  Education, K-12   Education, K-12   Education, K-12

Upvotes: 1

maderman
maderman

Reputation: 61

I've never seen this use case before. Can't think of an elegant way to do it, but you could first transpose the dataframe, then select just the rows you want, and then transpose it back.

In the example below I made row 7 the one that has the thing you want to filter on. So let's say you want to get rid of columns that have a 'c' in row 7. So basically we need to get rid of 'col2'.

>>> col1=['a','a','a','a','a','b','b','b']
>>> col2=['a','a','a','a','a','b','b','c']
>>> cols=['col1','col2']
>>> values=zip(col1,col2)
>>> import pandas as pd
>>> df=pd.DataFrame(data=values,columns=cols)
>>> df
  col1 col2
0    a    a
1    a    a
2    a    a
3    a    a
4    a    a
5    b    b
6    b    b
7    b    c
>>> dft=df.T
>>> dft
      0  1  2  3  4  5  6  7
col1  a  a  a  a  a  b  b  b
col2  a  a  a  a  a  b  b  c
>>> dff=dft[dft[7]!='c']
>>> dff
      0  1  2  3  4  5  6  7
col1  a  a  a  a  a  b  b  b
>>> dfo=dff.T
>>> dfo
  col1
0    a
1    a
2    a
3    a
4    a
5    b
6    b
7    b

Upvotes: 1

Related Questions