Replacing string with special characters in Pandas column

I have a large pandas dataframe where one of the columns has weird formatting. I am tring to replace the string, but I keep getting error messages saying: 'error: unterminated character set at position 0'. An example small dataframe is below:

col 1    col 2
Amy      [{'?username': 'usr/AMY16548'}]
Jack     [{'?username': 'usr/JACK15822'}]
Sarah    [{'?username': 'usr/SARAH00001'}]

Desired result

col 1    col 2
Amy      usr/AMY16548
Jack     usr/JACK15822
Sarah    usr/SARAH00001

Upvotes: 0

Answers (3)

BeRT2me

Reputation: 13242

Given:

   col_1                              col_2
0    Amy    [{'?username': 'usr/AMY16548'}]
1   Jack   [{'?username': 'usr/JACK15822'}]
2  Sarah  [{'?username': 'usr/SARAH00001'}]

Make col_2 into a python object:

Note, it's possible that this step can be skipped because they're already python objects~

from ast import literal_eval

df.col_2 = df.col_2.apply(literal_eval)

Now we can access those easily:

df.col_2 = df.col_2.str[0].str['?username']
print(df)

Output:

   col_1           col_2
0    Amy    usr/AMY16548
1   Jack   usr/JACK15822
2  Sarah  usr/SARAH00001

If you're going to do it the hard coded way... just do:

df['col 2'] = df['col 2'].str[16:-3]
# Same output as above.

Upvotes: 1

sitting_duck

Reputation: 3720

That is because whatever regex pattern that you are using has an opening [ with no associated closing ]. Unless you really are trying to use a regex character set (see https://blog.finxter.com/python-character-set-regex-tutorial/), you probably just want to escape the first [ in your pattern - like \[. Seeing your code would help.

Upvotes: 2

Jakub

Reputation: 184

Little bit barbaric way, I m sure there is pandas way how to do this, but i dont know it.

y=0
for x in df.iloc[:,1]:
    #print (x)
    text=str(x)
    index = text.find('usr')
    df.iloc[y,1]=text[index:-2]
    y+=1
print (df)

Upvotes: -1

Replacing string with special characters in Pandas column

Answers (3)

Related Questions