Gonza
Gonza

Reputation: 177

how to filter several columns with LIKE and follow the sequence?

I am trying to build a pandas dataframe by filtering the name of the columns. The dataframe is styled like:

NAME_1, EMAIL_1, PHONO_1, POSTALCODE_1, NAME_2, EMAIL_2, PHONO_2, POSTALCODE_2, etc.

I understand that to keep only the "email" columns, I just have to put:

df_temp = df.filter(like='EMAIL')

Where the result looks like:

EMAIL_1, EMAIL_2, EMAIL_n

So far so good, but I also want to add other columns to the filter, and also follow the sequence that links the data. I am looking for something like:

NAME_1, EMAIL_1, NAME_2, EMAIL_2

df_temp = df[df.filter(like='NAME') & df.filter(like='EMAIL')]

But it doesn't work.

The following does not return the data as I need it:

df_temp_1 = df.filter(like='NAME')
df_temp_2 = df.filter(like='EMAIL')
df_temp_3 = pd.concat([df_temp_1, df_temp_2], axis=1)

NAME_1, NAME_2, EMAIL_1, EMAIL_2, etc.

I want to see it as:

NAME_1, EMAIL_1, NAME_2, EMAIL_2

I would even like to be able to separate them into different dataframes.

do you recommend a method please?

From already thank you very much.

Regards

Upvotes: 1

Views: 169

Answers (5)

sammywemmy
sammywemmy

Reputation: 28709

one option is with select_columns from pyjanitor:

# pip install pyjanitor
import janitor 
import pandas as pd

In [106]: columns = ['NAME_1', 'EMAIL_1', 'PHONO_1', 'POSTALCODE_1', 'NAME_2','EMAIL_2', 'PHONO_2', 'POSTALCODE_2']

In [107]: df = pd.DataFrame([], columns = columns)

In [108]: df
Out[108]:
Empty DataFrame
Columns: [NAME_1, EMAIL_1, PHONO_1, POSTALCODE_1, NAME_2, EMAIL_2, PHONO_2, POSTALCODE_2]
Index: []

In [109]: df.select_columns('NAME*', 'EMAIL*')
Out[109]:
Empty DataFrame
Columns: [NAME_1, NAME_2, EMAIL_1, EMAIL_2]
Index: []

The * idea is based on unix shell globbing for filtering.

Upvotes: 1

user17242583
user17242583

Reputation:

Using a regular expression is the best solution, but you can also extract the columns from the two filters calls and join them together with union:

cols = df.filter(like='NAME').columns.union(df.filter(like='EMAIL').columns)

Output:

>>> cols
Index(['EMAIL_1', 'EMAIL_2', 'NAME_1', 'NAME_2'], dtype='object'

Upvotes: 1

BENY
BENY

Reputation: 323316

Check with startswith

out = df.loc[:,df.columns.str.startswith(('NAME','EMAIL'))]

Upvotes: 1

Ynjxsjmh
Ynjxsjmh

Reputation: 30050

You can try

out = df.filter(regex='(NAME|EMAIL)_[12]')

If suffix is different

out = df.filter(regex='(NAME_[12]|EMAIL_[3-5])')

Upvotes: 1

Corralien
Corralien

Reputation: 120469

Use regex instead of like:

out = df.filter(regex=('^(NAME|EMAIL)'))
print(out)

# Output (sample)
   NAME_1  EMAIL_1  NAME_2  EMAIL_2
0       5        9       5        9
1       8        2       3        9
2       8        8       1        5
3       6        7       9        5
4       6        6       4        3

Upvotes: 4

Related Questions