Reputation: 254
I have a data frame in python pandas I pulling the columns based on the below condition
spike_cols = [col for col in nodes.columns if 'Num' in col]
print(spike_cols)
But I am looking for multiple substrings to check in the columns if exist I want to pull all the columns that match any one of the substring
spike_cols = [col for col in nodes.columns if ('Num'|'Lice') in col]
print(spike_cols)
But I am getting below error
: unsupported operand type(s) for |: 'str' and 'str'
Upvotes: 0
Views: 1142
Reputation: 9019
You can use Series.str.contains
:
df[df.columns[df.columns.str.contains(r'Num|Lice')]]
If all you want is the column names themselves:
df.columns[df.columns.str.contains(r'Num|Lice')].tolist()
Upvotes: 2
Reputation: 42916
You can use DataFrame.filter
for this in combination with regex
argument:
# Create example dataframe
df = pd.DataFrame({'HelloNum': [1,2],
'World':[3,4],
'This':[5,6],
'ExampleLice':[7,8]})
print(df)
HelloNum World This ExampleLice
0 1 3 5 7
1 2 4 6 8
Apply DataFrame.filter
print(df.filter(regex='Num|Lice'))
HelloNum ExampleLice
0 1 7
1 2 8
Get column names in list
df.filter(regex='Num|Lice').columns.tolist()
['HelloNum', 'ExampleLice']
Upvotes: 2
Reputation: 1959
try this:
spike_cols = [col for col in nodes.columns if ('Num' in col or 'Lice' in col)]
Upvotes: 1