abn
abn

Reputation: 1373

Extract a column based on its contents CSV python

I have a csv file like below

h1,h2,h3
1 year,homo sapiens,fibrous tissue
3 minutes,homo sapiens,fibrous tissue
2 hours,homo sapiens,epithelial tissue

I'm trying to get just that column which has the string I provide in it. For example, if I say year, the entire column needs to be appended into a list like [1 year,3 minutes,2 hours]. I'm totally lost on how to proceed. I really appreciate any help.

EDIT: The issue with this is, the data can be in any column.

Upvotes: 0

Views: 196

Answers (2)

Anandhakumar R
Anandhakumar R

Reputation: 391

Try this

f=open('your_file.csv','r')

x=[]
for i in f:
    x.append(i)


"first column"

for i in range(len(x)):
    print x[i].split(',')[0]

Output h1

1 year

3 minutes

2 hours

"Second Column"


for i in range(len(x)):
    print x[i].split(',')[1]

Output:

h2

homo sapiens

homo sapiens

homo sapiens

Upvotes: 1

EdChum
EdChum

Reputation: 393983

We can use a list comprehension and a combination of any and str.contains:

In [183]:
# filter the columns for only those that contain our text of interest
cols_of_interest = [col for col in df if any(df[col].str.contains('year'))]
cols_of_interest
Out[183]:
['h1']
In [184]:
# use the list as a column filter
df[cols_of_interest]
Out[184]:
          h1
0     1 year
1  3 minutes
2    2 hours

So this tests if any value in the column contains the text of interest by calling the vectorised str method contains.

It would be easy to wrap the list comprehension into a function that returned the list:

In [185]:

def cols_contains(text):
    return [col for col in df if any(df[col].str.contains(text))]

df[cols_contains('year')]
Out[185]:
          h1
0     1 year
1  3 minutes
2    2 hours

Upvotes: 1

Related Questions