Reputation: 5660
I have a dataframe, df
, and a list of strings, cols_needed
, which indicate the columns I want to retain in df
. The column names in df
do not exactly match the strings in cols_needed
, so I cannot directly use something like intersection
. But the column names do contain the strings in cols_needed
. I tried playing around with str.contains
but couldn't get it to work. How can I subset df
based on cols_needed
?
import pandas as pd
df = pd.DataFrame({
'sim-prod1': [1,2],
'sim-prod2': [3,4],
'sim-prod3': [5,6],
'sim_prod4': [7,8]
})
cols_needed = ['prod1', 'prod2']
# What I want to obtain:
sim-prod1 sim-prod2
0 1 3
1 2 4
Upvotes: 3
Views: 2101
Reputation: 28644
A list comprehension could work as well:
columns = [cols for cols in df
for col in cols_needed
if col in cols]
['sim-prod1', 'sim-prod2']
In [110]: df.loc[:, columns]
Out[110]:
sim-prod1 sim-prod2
0 1 3
1 2 4
Upvotes: 3
Reputation: 59549
With the regex
option of filter
df.filter(regex='|'.join(cols_needed))
sim-prod1 sim-prod2
0 1 3
1 2 4
Upvotes: 3
Reputation: 150745
You can explore str.contains
with a joint pattern, for example:
df.loc[:,df.columns.str.contains('|'.join(cols_needed))]
Output:
sim-prod1 sim-prod2
0 1 3
1 2 4
Upvotes: 3