newWithPython
newWithPython

Reputation: 883

How create a list with some specific values with pandas?

I have a very large .csv file like this:

column1,id,column3,column4,words,column6
string,309483,0,0,hi#1,string string ....
string,234234,0.344,0,hello#1,string string ....
...
string,89789,0,.56799,world#1,string string ....
string,212934,0.8967,0,wolf#1 web#1 mouse#3,string string ....

I would like to extract in a list all the words that has in the column3 a float number greater than 0 and place them into a list for example, for the above instance this will be the output:

[hello#1, wolf#1, web#1, mouse#3]

Any idea of how to aproach this task with pandas?. Thanks in advance guys.

Upvotes: 0

Views: 57

Answers (3)

EdChum
EdChum

Reputation: 394041

If you wanted a list of all the unique words:

df[df.column3 > 0].words.unique()

You can cast this to a list by doing

list(df[df.column3 > 0].words.unique())

or use the numpy array method which will be faster than the above:

df[df.column3 > 0].words.unique().values.tolist()

Upvotes: 1

kennes
kennes

Reputation: 2145

Correction:

You can do it with iterrows, but it isn't concise as the above solution:

import itertools

your_list = list(row[1]['words'].split(' ') for row in dataframe.iterrows() if row[1]['column 3'] > 0)
chain = itertools.chain(*your_list)
your_list = list(chain)

Upvotes: 1

cphlewis
cphlewis

Reputation: 16249

' '.join(df[df.column3 > 0].words).split(' ')

result from test data:

['hello#1', 'wolf#1', 'web#1', 'mouse#3']

pandas syntax in the middle choosing the right rows; join all the words-colunn values together, split them apart into separate words.

Upvotes: 1

Related Questions