Extract a column based on its contents CSV python

Question

I have a csv file like below

h1,h2,h3
1 year,homo sapiens,fibrous tissue
3 minutes,homo sapiens,fibrous tissue
2 hours,homo sapiens,epithelial tissue

I'm trying to get just that column which has the string I provide in it. For example, if I say year, the entire column needs to be appended into a list like [1 year,3 minutes,2 hours]. I'm totally lost on how to proceed. I really appreciate any help.

EDIT: The issue with this is, the data can be in any column.

EdChum · Accepted Answer

We can use a list comprehension and a combination of any and str.contains:

In [183]:
# filter the columns for only those that contain our text of interest
cols_of_interest = [col for col in df if any(df[col].str.contains('year'))]
cols_of_interest
Out[183]:
['h1']
In [184]:
# use the list as a column filter
df[cols_of_interest]
Out[184]:
          h1
0     1 year
1  3 minutes
2    2 hours

So this tests if any value in the column contains the text of interest by calling the vectorised str method contains.

It would be easy to wrap the list comprehension into a function that returned the list:

In [185]:

def cols_contains(text):
    return [col for col in df if any(df[col].str.contains(text))]

df[cols_contains('year')]
Out[185]:
          h1
0     1 year
1  3 minutes
2    2 hours

Extract a column based on its contents CSV python

Answers (2)

Related Questions