Shilpa S Jadhav
Shilpa S Jadhav

Reputation: 59

Removing characters from the dataframe python

I want to replace a str from one of the column from the table. example: i want to remove b"SET and b"MULTISET from the df column. how to achieve that. I need output like Details are as below,

columns = ['cust_id', 'cust_name', 'vehicle', 'details', 'bill'] 
df = pd.DataFrame(data=t, columns=columns)
df
    
        cust_id     cust_name                   vehicle                             details                                                 bill
0   101         b"SET{'Tom','C'}"           b"MULTISET{'Toyota','Cruiser'}"     b"ROW('Street 1','12345678','NewYork, US')"             1200.00
1   102         b"SET{'Rachel','Green'}"    b"MULTISET{'Ford','se'}"            b"ROW('Street 2','12344444','Florida, US')"             2400.00
2   103         b"SET{'Chandler','Bing'}"   b"MULTISET{'Dodge','mpv'}"          b"ROW('Street 1','12345555','Georgia, US')"             601.10 

Required Output:

    cust_id     cust_name                   vehicle                             details                                         bill
0   101         {'Tom','C'}                 {'Toyota','Cruiser'}            ('Street 1','12345678','NewYork, US')               1200.00
1   102         {'Rachel','Green'}          {'Ford','se'}                   ('Street 2','12344444','Florida, US')               2400.00
2   103         {'Chandler','Bing'}         {'Dodge','mpv'}                 ('Street 1','12345555','Georgia, US')               601.10 

Upvotes: 1

Views: 74

Answers (1)

sushanth
sushanth

Reputation: 8302

Here is a possible solution,

  • Let's define column of interest,
columns = ['cust_name', 'vehicle', 'details']
  • Use regex expression to extract values between {} or ()
regex_ = r"([\{|\(].*[\}|\)])"
  • Putting together, str.decode('ascii') is to convert columns values from byte to string.
columns = ['cust_name', 'vehicle', 'details']

regex_ = r"([\{|\(].*[\}|\)])"

for col in columns:
    df[col] = df[col].str.decode('ascii').str.extract(regex_)

   cust_id            cust_name  ...                                details    bill
0      101          {'Tom','C'}  ...  ('Street 1','12345678','NewYork, US')  1200.0
1      102   {'Rachel','Green'}  ...  ('Street 2','12344444','Florida, US')  2400.0
2      103  {'Chandler','Bing'}  ...  ('Street 1','12345555','Georgia, US')   601.1

Upvotes: 1

Related Questions