Reputation: 1679
I have a df with a column like this:
words
1 ['me']
2 ['they']
4 ['it', 'we', 'it']
5 []
6 ['we', 'we', 'it']
I want it to look like this:
words
1 'me'
2 'they'
4 'it we it'
5 ''
6 'we we it'
I have tried both these options, but they both yield in a result identical to the original series.
def join_words(df):
words_string = ''.join(df.words)
return words_string
master_df['words_string'] = master_df.apply(join_words, axis=1)
and...
master_df['words_String'] = master_df.words.str.join(' ')
Both these result in the original df. What am I doing wrong?
Using master_df['words_string'] = master_df['words'].apply(' '.join)
, I got:
1 [ ' m e ' ]
2 [ ' t h e y ' ]
4 [ ' i t ' , ' w e ' , ' i t ' ]
5 [ ]
6 [ ' w e ' , ' w e ' , ' i t ' ]
Upvotes: 2
Views: 7190
Reputation: 5764
Another idea is using the DataFrame.explode (since version 0.25.0) and the groupby/aggregate methods.
import pandas as pd
# create a list of list of strings
values = [
['me'],
['they'],
['it', 'we', 'it'],
[],
['we', 'we', 'it']
]
# convert to a data frame
df = pd.DataFrame({'words': values})
# explode the cells (with lists) into separate rows having the same index
df2 = df.explode('words')
df2
This creates a table in the long-format giving the following output:
words
0 me
1 they
2 it
2 we
2 it
3 nan
4 we
4 we
4 it
Now the long-format needs to be joined / aggregated:
# make sure the dtype is string
df2['words'] = df2['words'].astype(str)
# group by the index aggregating all values to a single string
df2.groupby(level=0).agg(' '.join)
giving the output:
words
0 me
1 they
2 it we it
3 nan
4 we we it
Upvotes: 0
Reputation: 150735
Generally I'd advise against eval
. Here's another approach when the elements are string
not list
:
words.str.extractall("'(\w*)'").groupby(level=0)[0].agg(' '.join)
Output:
1 me
2 they
4 it we it
6 we we it
Name: 0, dtype: object
Upvotes: 1
Reputation: 18367
As your edit shows, it seems the rows are not actually lists
but strings
interpreted as lists. We can use eval
to ensure the format is of type list
so as to later perform the join
. It seems your sample data is the following:
df = pd.DataFrame({'index':[0,1,2,3,4],
'words':["['me']","['they']","['it','we','it']","[]","['we','we','it']"]})
How about this? Using apply
with a lambda function which uses ' '.join()
for each row (list):
df['words'] = df['words'].apply(eval).apply(' '.join)
print(df)
Output:
index words
0 0 me
1 1 they
2 2 it we it
3 3
4 4 we we it
Upvotes: 4