Reputation: 311
i have a data frame that looks like this:
Column1 Column2
'['jjhjh', 'adads','adsd']', 'dwdwdqw'
'['adads','adsd']', 'dwdwdqw'
'['jjhjh', 'adads','adsd']', 'dwdwdqw'
'['adads','adsd']', 'dwdwdqw'
'['adads','adsd']', 'dwdwdqw'
Although the items in Column1 look like lists of items, they are strings. I want to remove the square brackets from this string, remove the quotes and replace all values in that column with the same values but as comma separated strings. My desired output would be:
Column1 Column2
'jjhjh', 'adads','adsd', 'dwdwdqw'
'adads','adsd', 'dwdwdqw'
'jjhjh', 'adads','adsd', 'dwdwdqw'
'adads','adsd', 'dwdwdqw'
'adads','adsd', 'dwdwdqw'
I tried the following function but it does not replace the elements:
def string_convert(column_name):
lista=[]
for i in column_name:
i=i.strip("[]")
i=eval(i)
lista.append(i)
for m in lista:
if m == tuple:
column_name = m[0] + ',' + m[1]
else:
column_name = m
return df['other']
Can anyone help me with this? Thanks in advance.
Upvotes: 0
Views: 2390
Reputation: 3730
By leveraging str.replace()
:
df.assign(Column1=df['Column1'].str.replace(r"'\[(.*)\]',", r"\1,", regex=True))
Column1 Column2
0 'jjhjh', 'adads','adsd', 'dwdwdqw'
1 'adads','adsd', 'dwdwdqw'
2 'jjhjh', 'adads','adsd', 'dwdwdqw'
3 'adads','adsd', 'dwdwdqw'
4 'adads','adsd', 'dwdwdqw'
Upvotes: 0
Reputation: 2147
According to the new final needs included by the question creator in her own answer, like a generating a final list with all values to column1
, I recommend doing it as follows:
import pandas as pd
df = pd.DataFrame(columns=['column1','column2'])
df['column1'] = ["'['jjhjh', 'adads','adsd']'","'['adads','adsd']'","'['jjhjh', 'adads','adsd']'","'['adads','adsd']'","'['adads','adsd']'"]
df['column2'] = ['dwdwdqw','dwdwdqw','dwdwdqw','dwdwdqw','dwdwdqw']
df['column1'] = df['column1'].replace('\[|\'|\"|\]| ', '', regex=True)
df['column1'] = df['column1'].str.split(',')
final_list = [x for xs in df['column1'].tolist() for x in xs]
print(final_list)
Output:
['jjhjh', 'adads', 'adsd', 'adads', 'adsd', 'jjhjh', 'adads', 'adsd', 'adads', 'adsd', 'adads', 'adsd']
But I confess that there is a risk in this method because if the string you want has a space between the words, they will be removed, so in your case I still think you should be careful with the use.
Greetings from Brazil!
Upvotes: 1
Reputation: 311
The best solution for my cases was the code below because I had in each string quotes, double quotes and squared brackets. With this code I could remove everything and after that the For loop I removed the unnecessary spaces and split each row into a list :
df['column1']=df['column1'].replace('\[|\'|\"|\]', '', regex=True)
df['column1']=df['column1'].str.split(',')
lista=[]
for s in df['column1']:
for a in s:
lista.append(a)
lista1=[b.strip() for b in lista]
Upvotes: 0
Reputation: 110
This loop worked for me.
for i, row in df.iterrows():
tmp_val = row['Column1'].replace("'", "").replace("[", "").replace("]", "").split(',')
row['Column1'] = tmp_val
This loops through each row of the DataFrame and modifies the cell in Column1 by doing a string replace to remove the characters you don't want(square brackets and single quotes). Then .split(',') creates the list and the last line of code replaces the original value with our newly created list.
Upvotes: 1