Reputation: 7500
I have a pandas dataframe column which is a series. The column contains elements which are list of strings. However this column is basically an array_agg of postgressql , so each element is a list but something like this:
<type 'list'>
Here is how first two elements of this column(Series) look
0 [UMIN Clinical Trial Registry [Website Last up...
1 [Disposition of Patients \n\nSTARTED; Tetracai...
Name: notes, dtype: object
When I do column[0] I get this:
['UMIN Clinical Trial Registry [Website Last updated date: May 26, 2011] \n\nRecruitment status: Not yet recruiting \n\nDate of protocol fixation: 02/01/2011 \n\nAnticipated trial start date: 07/01/2011 \n\nName of primary sponsor: The Second Department of Internal Medicine Tokyo Medical University \n\nSource of funding: OMRON Health Care corporation \n\nhttps://upload.umin.ac.jp/cgi-open-bin/ctr/ctr.cgi?function=brows&action=brows&type=summary&recptno=R000006682&language=E', 'The projected start date 07/01/2011 was removed because that date passed without evidence of trial start.\n\nhttps://upload.umin.ac.jp/cgi-open-bin/ctr/ctr.cgi?function=brows&action=brows&type=summary&recptno=R000006682&language=E']
If you see each element of this column is a list of string. I want to get a final column where instead of each element being a list of string, it should combine all the string within a list and give as a string.
The problem is the list element itself is a string since it was created using array_agg. So it is not an iterable that I can use " ".join(column[0]). Gives an error that column[0] is not a list but of type 'list'
How to overcome this?
EDIT:
If I do this:
for x in column:
s=" ".join(x)
docs.append(s)
break
it works. But if I want to do it for all without a break statement, it throws an error:
for x in column:
s=" ".join(x)
docs.append(s)
Error:
<ipython-input-154-556942a06d81> in <module>() 1 for x in trials_notes.notes: ----> 2 s=" ".join(x) 3 docs.append(s) 4 TypeError: can only join an iterable –
Upvotes: 1
Views: 267
Reputation: 90979
You can use Series.str.join()
and give the delimiter to join by as argument. Example -
newcol = column.str.join(' ')
Demo -
In [3]: import pandas as pd
In [4]: column = pd.Series([['blah1'],['blah2'],['blah123']],name='blah')
In [5]: column.str.join(' ')
Out[5]:
0 blah1
1 blah2
2 blah123
Name: blah, dtype: object
In [7]: type(column[0])
Out[7]: list
Upvotes: 2