Baktaawar
Baktaawar

Reputation: 7500

Joining a elements of a list within a series

I have a pandas dataframe column which is a series. The column contains elements which are list of strings. However this column is basically an array_agg of postgressql , so each element is a list but something like this:

<type 'list'>

Here is how first two elements of this column(Series) look

0    [UMIN Clinical Trial Registry [Website Last up...
1    [Disposition of Patients \n\nSTARTED; Tetracai...
Name: notes, dtype: object

When I do column[0] I get this:

['UMIN Clinical Trial Registry [Website Last updated date: May 26, 2011] \n\nRecruitment status: Not yet recruiting \n\nDate of protocol fixation: 02/01/2011 \n\nAnticipated trial start date: 07/01/2011 \n\nName of primary sponsor: The Second Department of Internal Medicine Tokyo Medical University \n\nSource of funding: OMRON Health Care corporation \n\nhttps://upload.umin.ac.jp/cgi-open-bin/ctr/ctr.cgi?function=brows&action=brows&type=summary&recptno=R000006682&language=E', 'The projected start date 07/01/2011 was removed because that date passed without evidence of trial start.\n\nhttps://upload.umin.ac.jp/cgi-open-bin/ctr/ctr.cgi?function=brows&action=brows&type=summary&recptno=R000006682&language=E']

If you see each element of this column is a list of string. I want to get a final column where instead of each element being a list of string, it should combine all the string within a list and give as a string.

The problem is the list element itself is a string since it was created using array_agg. So it is not an iterable that I can use " ".join(column[0]). Gives an error that column[0] is not a list but of type 'list'

How to overcome this?

EDIT:

 If I do this: 

for x in column: 
   s=" ".join(x) 
   docs.append(s) 
   break 

it works. But if I want to do it for all without a break statement, it throws an error:

for x in column:
   s=" ".join(x) 
   docs.append(s)

Error:

<ipython-input-154-556942a06d81> in <module>() 1 for x in trials_notes.notes: ----> 2 s=" ".join(x) 3 docs.append(s) 4 TypeError: can only join an iterable –

Upvotes: 1

Views: 267

Answers (1)

Anand S Kumar
Anand S Kumar

Reputation: 90979

You can use Series.str.join() and give the delimiter to join by as argument. Example -

newcol = column.str.join(' ')

Demo -

In [3]: import pandas as pd

In [4]: column = pd.Series([['blah1'],['blah2'],['blah123']],name='blah')

In [5]: column.str.join(' ')
Out[5]:
0      blah1
1      blah2
2    blah123
Name: blah, dtype: object

In [7]: type(column[0])
Out[7]: list

Upvotes: 2

Related Questions