Natália Resende
Natália Resende

Reputation: 311

convert pandas data frame column values into comma separated strings

i have a data frame that looks like this:

    Column1                      Column2

'['jjhjh', 'adads','adsd']',    'dwdwdqw'
'['adads','adsd']',             'dwdwdqw'
'['jjhjh', 'adads','adsd']',    'dwdwdqw'
'['adads','adsd']',             'dwdwdqw'
'['adads','adsd']',             'dwdwdqw'

Although the items in Column1 look like lists of items, they are strings. I want to remove the square brackets from this string, remove the quotes and replace all values in that column with the same values but as comma separated strings. My desired output would be:

   Column1                   Column2

'jjhjh', 'adads','adsd',    'dwdwdqw'
'adads','adsd',             'dwdwdqw'
'jjhjh', 'adads','adsd',    'dwdwdqw'
'adads','adsd',             'dwdwdqw'
'adads','adsd',             'dwdwdqw'

I tried the following function but it does not replace the elements:

def string_convert(column_name):
  lista=[]
  for i in column_name:
    i=i.strip("[]")
    i=eval(i)
    lista.append(i)
  for m in lista:
    if m == tuple:
      column_name = m[0] + ',' + m[1]
    else:
      column_name = m
  return df['other']

Can anyone help me with this? Thanks in advance.

Upvotes: 0

Views: 2390

Answers (4)

sitting_duck
sitting_duck

Reputation: 3730

By leveraging str.replace():

df.assign(Column1=df['Column1'].str.replace(r"'\[(.*)\]',", r"\1,", regex=True))

                    Column1                 Column2
0  'jjhjh', 'adads','adsd',               'dwdwdqw'
1           'adads','adsd',               'dwdwdqw'
2  'jjhjh', 'adads','adsd',               'dwdwdqw'
3           'adads','adsd',               'dwdwdqw'
4           'adads','adsd',               'dwdwdqw'

Upvotes: 0

Digital Farmer
Digital Farmer

Reputation: 2147

According to the new final needs included by the question creator in her own answer, like a generating a final list with all values to column1, I recommend doing it as follows:

import pandas as pd

df = pd.DataFrame(columns=['column1','column2'])
df['column1'] = ["'['jjhjh', 'adads','adsd']'","'['adads','adsd']'","'['jjhjh', 'adads','adsd']'","'['adads','adsd']'","'['adads','adsd']'"]
df['column2'] = ['dwdwdqw','dwdwdqw','dwdwdqw','dwdwdqw','dwdwdqw']

df['column1'] = df['column1'].replace('\[|\'|\"|\]| ', '', regex=True)

df['column1'] = df['column1'].str.split(',')

final_list = [x for xs in df['column1'].tolist() for x in xs]

print(final_list)

Output:

['jjhjh', 'adads', 'adsd', 'adads', 'adsd', 'jjhjh', 'adads', 'adsd', 'adads', 'adsd', 'adads', 'adsd']

But I confess that there is a risk in this method because if the string you want has a space between the words, they will be removed, so in your case I still think you should be careful with the use.

Greetings from Brazil!

Upvotes: 1

Natália Resende
Natália Resende

Reputation: 311

The best solution for my cases was the code below because I had in each string quotes, double quotes and squared brackets. With this code I could remove everything and after that the For loop I removed the unnecessary spaces and split each row into a list :

df['column1']=df['column1'].replace('\[|\'|\"|\]', '', regex=True)

df['column1']=df['column1'].str.split(',')

lista=[]
for s in df['column1']:
  for a in s:
    lista.append(a)

lista1=[b.strip() for b in lista]

Upvotes: 0

Tyler Anderson
Tyler Anderson

Reputation: 110

This loop worked for me.

for i, row in df.iterrows():
    tmp_val = row['Column1'].replace("'", "").replace("[", "").replace("]", "").split(',')
    row['Column1'] = tmp_val

This loops through each row of the DataFrame and modifies the cell in Column1 by doing a string replace to remove the characters you don't want(square brackets and single quotes). Then .split(',') creates the list and the last line of code replaces the original value with our newly created list.

Upvotes: 1

Related Questions