Reputation: 31
I have the following dataframe:
df = pd.DataFrame([{'c1':'Hello world'}, {'c1':'Hello all the world'}])
I want to make a list with all the the words contained in the column "c1". The resulting list should look like this:
list=['Hello','world','Hello','all','the','world']
I thought I could iterate over the rows in the dataframe and for each row loop through the words in "c1" and output the words to a list, but I can't make it work.
Upvotes: 2
Views: 6467
Reputation: 854
First we create the sample df
and empty list a_list
df = pd.DataFrame([{'c1':'Hello world'},
{'c1':'Hello all the world'}])
a_list = []
for value in df.c1.str.split(' '):
a_list.extend(value)
This for loop iterates through each row in column c1
, does work on the value in each row, then adds the output to a_list
via .extend
function.
What's happening from left to right, bot to top:
for
: starts the loop.
value
: is a temporary variable used to store the value within each row of c1
column.
df.c1
: selects c1
column from df
.str.split()
: accesses the the string value and splits where there are spaces(' '
), turning those row values into lists
a_list.extend(value)
:
adds value
with each iteration to a_list
Print the list
print(a_list)
['Hello', 'world', 'Hello', 'all', 'the', 'world']
Upvotes: 2
Reputation: 13377
Try:
df.stack().str.split("[^\w+]").explode().tolist()
Outputs:
['Hello', 'world', 'Hello', 'all', 'the', 'world']
Upvotes: 4