Mi.
Mi.

Reputation: 510

Removing Strings from a Pandas DataFrame Column

I have a pandas dataframe as shown below.

DF1 =

sid                 path
 1    '["rome","is","in","province","lazio"]'   
 1    "['rome', 'is', 'in', 'province', 'naples']"
 1     ['N']
 1    "['rome', 'is', 'in', 'province', 'in', 'campania']"
 ....

I want to remove all unnecessary characters of the column path so the result should look like this:

DF2 =

    sid                  path
     1         rome is in province lazio
     1         rome is in province naples
     1                    N
     1         rome is in province in campania
 ....

I tried replacing all the unnecessary characters like this :

 DF1["path"].replace("[","").replace("]","").replace('"',"").replace(","," ").replace("'","")

But it didn't work. I suppose it's due to the entries ["N"]

How can I do this? Any help is appreciated!

Upvotes: 1

Views: 244

Answers (2)

Rakesh
Rakesh

Reputation: 82795

Using ast.literal_eval & str.join

Demo:

import pandas as pd
import ast
df = pd.DataFrame({"path": ['["rome","is","in","province","lazio"]', "['rome', 'is', 'in', 'province', 'naples']", ['N']]})
df['path'] = df['path'].astype(str).apply(ast.literal_eval).apply(lambda x: " ".join(x))
print(df)

Output:

                         path
0   rome is in province lazio
1  rome is in province naples
2                           N

Upvotes: 1

jpp
jpp

Reputation: 164823

You can use ast.literal_eval to safely read lists output as strings. One way to account for genuine lists is to catch ValueError.

Note that, if at all possible, you should try to sort these issues upstream before they reach your dataframe.

from ast import literal_eval

df = pd.DataFrame({'sid': [1, 1, 1, 1],
                   'path': ['["rome","is","in","province","lazio"]',
                            "['rome', 'is', 'in', 'province', 'naples']",
                            ['N'],
                            "['rome', 'is', 'in', 'province', 'in', 'campania']"]})

def converter(x):
    try:
        return ' '.join(literal_eval(x))
    except ValueError:
        return ' '.join(x)

df['path'] = df['path'].apply(converter)

print(df)

                              path  sid
0        rome is in province lazio    1
1       rome is in province naples    1
2                                N    1
3  rome is in province in campania    1

Upvotes: 1

Related Questions