Reputation: 883
I have a very large .csv
file like this:
column1,id,column3,column4,words,column6
string,309483,0,0,hi#1,string string ....
string,234234,0.344,0,hello#1,string string ....
...
string,89789,0,.56799,world#1,string string ....
string,212934,0.8967,0,wolf#1 web#1 mouse#3,string string ....
I would like to extract in a list all the words
that has in the column3
a float number greater than 0 and place them into a list for example, for the above instance this will be the output:
[hello#1, wolf#1, web#1, mouse#3]
Any idea of how to aproach this task with pandas?. Thanks in advance guys.
Upvotes: 0
Views: 57
Reputation: 394041
If you wanted a list of all the unique words:
df[df.column3 > 0].words.unique()
You can cast this to a list by doing
list(df[df.column3 > 0].words.unique())
or use the numpy array method which will be faster than the above:
df[df.column3 > 0].words.unique().values.tolist()
Upvotes: 1
Reputation: 2145
Correction:
You can do it with iterrows, but it isn't concise as the above solution:
import itertools
your_list = list(row[1]['words'].split(' ') for row in dataframe.iterrows() if row[1]['column 3'] > 0)
chain = itertools.chain(*your_list)
your_list = list(chain)
Upvotes: 1
Reputation: 16249
' '.join(df[df.column3 > 0].words).split(' ')
result from test data:
['hello#1', 'wolf#1', 'web#1', 'mouse#3']
pandas syntax in the middle choosing the right rows; join
all the words-colunn values together, split
them apart into separate words.
Upvotes: 1