connor449
connor449

Reputation: 1679

Converting list of strings in pandas column into string

I have a df with a column like this:

                       words
1                     ['me']
2                   ['they']
4         ['it', 'we', 'it']
5                         []
6         ['we', 'we', 'it']

I want it to look like this:

                     words
1                     'me'
2                   'they'
4               'it we it'
5                       ''          
6               'we we it'

I have tried both these options, but they both yield in a result identical to the original series.

def join_words(df):
    words_string = ''.join(df.words)
    return words_string

master_df['words_string'] = master_df.apply(join_words, axis=1)

and...

master_df['words_String'] = master_df.words.str.join(' ')

Both these result in the original df. What am I doing wrong?

Edit

Using master_df['words_string'] = master_df['words'].apply(' '.join), I got:

1                                     [ ' m e ' ]
2                                 [ ' t h e y ' ]
4             [ ' i t ' ,   ' w e ' ,   ' i t ' ]
5                                             [ ]
6             [ ' w e ' ,   ' w e ' ,   ' i t ' ]

Upvotes: 2

Views: 7190

Answers (3)

Matthias
Matthias

Reputation: 5764

Another idea is using the DataFrame.explode (since version 0.25.0) and the groupby/aggregate methods.

import pandas as pd

# create a list of list of strings
values = [
    ['me'],
    ['they'],
    ['it', 'we', 'it'],
    [],
    ['we', 'we', 'it']
]

# convert to a data frame
df = pd.DataFrame({'words': values})

# explode the cells (with lists) into separate rows having the same index
df2 = df.explode('words')
df2

This creates a table in the long-format giving the following output:

  words
0    me
1  they
2    it
2    we
2    it
3   nan
4    we
4    we
4    it

Now the long-format needs to be joined / aggregated:

# make sure the dtype is string
df2['words'] = df2['words'].astype(str)

# group by the index aggregating all values to a single string
df2.groupby(level=0).agg(' '.join)

giving the output:

      words
0        me
1      they
2  it we it
3       nan
4  we we it

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150735

Generally I'd advise against eval. Here's another approach when the elements are string not list:

words.str.extractall("'(\w*)'").groupby(level=0)[0].agg(' '.join)

Output:

1          me
2        they
4    it we it
6    we we it
Name: 0, dtype: object

Upvotes: 1

Celius Stingher
Celius Stingher

Reputation: 18367

Edit:

As your edit shows, it seems the rows are not actually lists but strings interpreted as lists. We can use eval to ensure the format is of type list so as to later perform the join. It seems your sample data is the following:

df = pd.DataFrame({'index':[0,1,2,3,4],
                   'words':["['me']","['they']","['it','we','it']","[]","['we','we','it']"]})

How about this? Using apply with a lambda function which uses ' '.join() for each row (list):

df['words'] = df['words'].apply(eval).apply(' '.join)
print(df)

Output:

   index     words
0      0        me
1      1      they
2      2  it we it
3      3          
4      4  we we it

Upvotes: 4

Related Questions