Reputation: 59
I have a large pandas dataframe where one of the columns has weird formatting. I am tring to replace the string, but I keep getting error messages saying: 'error: unterminated character set at position 0'. An example small dataframe is below:
col 1 col 2
Amy [{'?username': 'usr/AMY16548'}]
Jack [{'?username': 'usr/JACK15822'}]
Sarah [{'?username': 'usr/SARAH00001'}]
Desired result
col 1 col 2
Amy usr/AMY16548
Jack usr/JACK15822
Sarah usr/SARAH00001
Upvotes: 0
Views: 590
Reputation: 13242
Given:
col_1 col_2
0 Amy [{'?username': 'usr/AMY16548'}]
1 Jack [{'?username': 'usr/JACK15822'}]
2 Sarah [{'?username': 'usr/SARAH00001'}]
col_2
into a python object:from ast import literal_eval
df.col_2 = df.col_2.apply(literal_eval)
df.col_2 = df.col_2.str[0].str['?username']
print(df)
Output:
col_1 col_2
0 Amy usr/AMY16548
1 Jack usr/JACK15822
2 Sarah usr/SARAH00001
If you're going to do it the hard coded way... just do:
df['col 2'] = df['col 2'].str[16:-3]
# Same output as above.
Upvotes: 1
Reputation: 3720
That is because whatever regex pattern that you are using has an opening [
with no associated closing ]
. Unless you really are trying to use a regex character set (see https://blog.finxter.com/python-character-set-regex-tutorial/), you probably just want to escape the first [
in your pattern - like \[
. Seeing your code would help.
Upvotes: 2
Reputation: 184
Little bit barbaric way, I m sure there is pandas way how to do this, but i dont know it.
y=0
for x in df.iloc[:,1]:
#print (x)
text=str(x)
index = text.find('usr')
df.iloc[y,1]=text[index:-2]
y+=1
print (df)
Upvotes: -1