Tlund
Tlund

Reputation: 31

Make a list of all words in pandas dataframe column

I have the following dataframe:

df = pd.DataFrame([{'c1':'Hello world'}, {'c1':'Hello all the world'}])

I want to make a list with all the the words contained in the column "c1". The resulting list should look like this:

list=['Hello','world','Hello','all','the','world']

I thought I could iterate over the rows in the dataframe and for each row loop through the words in "c1" and output the words to a list, but I can't make it work.

Upvotes: 2

Views: 6467

Answers (3)

Ukrainian-serge
Ukrainian-serge

Reputation: 854

First we create the sample df and empty list a_list

df = pd.DataFrame([{'c1':'Hello world'}, 
                   {'c1':'Hello all the world'}]) 
a_list = []
for value in df.c1.str.split(' '):
    a_list.extend(value)

This for loop iterates through each row in column c1, does work on the value in each row, then adds the output to a_list via .extend function.

What's happening from left to right, bot to top:
for : starts the loop.
value: is a temporary variable used to store the value within each row of c1 column.

df.c1 : selects c1 column from df
.str.split() : accesses the the string value and splits where there are spaces(' '), turning those row values into lists

a_list.extend(value) : adds value with each iteration to a_list

Print the list

print(a_list) 
['Hello', 'world', 'Hello', 'all', 'the', 'world']

Upvotes: 2

Georgina Skibinski
Georgina Skibinski

Reputation: 13377

Try:

df.stack().str.split("[^\w+]").explode().tolist()

Outputs:

['Hello', 'world', 'Hello', 'all', 'the', 'world']

Upvotes: 4

Mahesh Sinha
Mahesh Sinha

Reputation: 91

You can do like this:

' '.join([i for i in df['c1']]).split()

Upvotes: 7

Related Questions