Reputation: 510
I have a pandas dataframe as shown below.
DF1 =
sid path
1 '["rome","is","in","province","lazio"]'
1 "['rome', 'is', 'in', 'province', 'naples']"
1 ['N']
1 "['rome', 'is', 'in', 'province', 'in', 'campania']"
....
I want to remove all unnecessary characters of the column path
so the result should look like this:
DF2 =
sid path
1 rome is in province lazio
1 rome is in province naples
1 N
1 rome is in province in campania
....
I tried replacing all the unnecessary characters like this :
DF1["path"].replace("[","").replace("]","").replace('"',"").replace(","," ").replace("'","")
But it didn't work. I suppose it's due to the entries ["N"]
How can I do this? Any help is appreciated!
Upvotes: 1
Views: 244
Reputation: 82795
Using ast.literal_eval
& str.join
Demo:
import pandas as pd
import ast
df = pd.DataFrame({"path": ['["rome","is","in","province","lazio"]', "['rome', 'is', 'in', 'province', 'naples']", ['N']]})
df['path'] = df['path'].astype(str).apply(ast.literal_eval).apply(lambda x: " ".join(x))
print(df)
Output:
path
0 rome is in province lazio
1 rome is in province naples
2 N
Upvotes: 1
Reputation: 164823
You can use ast.literal_eval
to safely read lists output as strings. One way to account for genuine lists is to catch ValueError
.
Note that, if at all possible, you should try to sort these issues upstream before they reach your dataframe.
from ast import literal_eval
df = pd.DataFrame({'sid': [1, 1, 1, 1],
'path': ['["rome","is","in","province","lazio"]',
"['rome', 'is', 'in', 'province', 'naples']",
['N'],
"['rome', 'is', 'in', 'province', 'in', 'campania']"]})
def converter(x):
try:
return ' '.join(literal_eval(x))
except ValueError:
return ' '.join(x)
df['path'] = df['path'].apply(converter)
print(df)
path sid
0 rome is in province lazio 1
1 rome is in province naples 1
2 N 1
3 rome is in province in campania 1
Upvotes: 1