Reputation: 1373
I have a csv file like below
h1,h2,h3
1 year,homo sapiens,fibrous tissue
3 minutes,homo sapiens,fibrous tissue
2 hours,homo sapiens,epithelial tissue
I'm trying to get just that column which has the string I provide in it. For example, if I say year, the entire column needs to be appended into a list like [1 year,3 minutes,2 hours]. I'm totally lost on how to proceed. I really appreciate any help.
EDIT: The issue with this is, the data can be in any column.
Upvotes: 0
Views: 196
Reputation: 391
Try this
f=open('your_file.csv','r')
x=[]
for i in f:
x.append(i)
"first column"
for i in range(len(x)):
print x[i].split(',')[0]
Output h1
1 year
3 minutes
2 hours
"Second Column"
for i in range(len(x)):
print x[i].split(',')[1]
Output:
h2
homo sapiens
homo sapiens
homo sapiens
Upvotes: 1
Reputation: 393983
We can use a list comprehension and a combination of any
and str.contains
:
In [183]:
# filter the columns for only those that contain our text of interest
cols_of_interest = [col for col in df if any(df[col].str.contains('year'))]
cols_of_interest
Out[183]:
['h1']
In [184]:
# use the list as a column filter
df[cols_of_interest]
Out[184]:
h1
0 1 year
1 3 minutes
2 2 hours
So this tests if any
value in the column contains the text of interest by calling the vectorised str
method contains
.
It would be easy to wrap the list comprehension into a function that returned the list:
In [185]:
def cols_contains(text):
return [col for col in df if any(df[col].str.contains(text))]
df[cols_contains('year')]
Out[185]:
h1
0 1 year
1 3 minutes
2 2 hours
Upvotes: 1