Student
Student

Reputation: 1197

How to efficiently replace partial strings in pandas?

Objective: to reformat the contents of a pandas dataframe based on what has been provided to me.

I have the following dataframe: Example dataframe

I am looking to change each column with the following style:

enter image description here

I am using the following code to produce the style I need, but it is not efficient:

lt = []
for i in patterns['Components'][0]:
    for x in i.split('__'):
        lt.append(x)
lt[1].replace('(','').replace(', ',' < '+str(lt[0])+' ≤ ').replace(']','')

I have attempted Pandas Replace to no avail - it throws no errors and seems to ignore what I am aiming to do.

Upvotes: 0

Views: 129

Answers (2)

Srivasan Sridharan
Srivasan Sridharan

Reputation: 136

import pandas as pd
import re
data=pd.DataFrame({'components':
['(quantity__(0.0,16199.0])','(unitprice__(-1055.648,8494.557])'],'outcome':
['(unitprice__(-1055.648,8494.557])','quantity__(0.0,16199.0])']})


def func(x):
    x=str(x)
    x=x.split('__')
    dx=x[0].replace("(",'')
    mt=re.findall('\d*\.\d*',x[1])
    return('{}<{}<={}'.format(dx,mt[0],mt[1]))


df=data.applymap(func)
print(df)

Upvotes: 0

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210882

Source DF:

In [37]: df
Out[37]:
                           Components                             Outcome
0          (Quantity__(0.0, 16199.0])  (UnitPrice__(-1055.648, 3947.558])
1  (UnitPrice__(-1055.648, 3947.558])          (Quantity__(0.0, 16199.0])

Solution:

In [38]: cols = ['Components','Outcome']
    ...: df[cols] = df[cols].replace(r'\(([^_]*)__\(([^,\s]+),\s*([^\]]+)\]\).*',
    ...:                             r'\2 < \1 <= \3',
    ...:                             regex=True)

Result:

In [39]: df
Out[39]:
                          Components                            Outcome
0          0.0 < Quantity <= 16199.0  -1055.648 < UnitPrice <= 3947.558
1  -1055.648 < UnitPrice <= 3947.558          0.0 < Quantity <= 16199.0

UPDATE:

In [113]: df
Out[113]:
                                Components                               Outcome
0             (Quantity__(0.0, 16199.0])     (UnitPrice__(-1055.648, 3947.558])
1    (UnitPrice__(-1055.648, 3947.558])             (Quantity__(0.0, 16199.0])

In [114]: cols = ['Components','Outcome']

In [115]: pat = r'\s*\(([^_]*)__\(([^,\s]+),\s*([^\]]+)\]\)\s*'

In [116]: df[cols] = df[cols].replace(pat, r'\2 < \1 <= \3', regex=True)

In [117]: df
Out[117]:
                          Components                            Outcome
0          0.0 < Quantity <= 16199.0  -1055.648 < UnitPrice <= 3947.558
1  -1055.648 < UnitPrice <= 3947.558          0.0 < Quantity <= 16199.0

or witout parentheses:

In [119]: df
Out[119]:
                         Components                           Outcome
0         Quantity__(0.0, 16199.0])  UnitPrice__(-1055.648, 3947.558]
1  UnitPrice__(-1055.648, 3947.558]          Quantity__(0.0, 16199.0]

In [120]: pat = r'([^_]*)__\(([^,\s]+),\s*([^\]]+)\]'

In [121]: df[cols] = df[cols].replace(pat, r'\2 < \1 <= \3', regex=True)

In [122]: df
Out[122]:
                          Components                            Outcome
0         0.0 < Quantity <= 16199.0)  -1055.648 < UnitPrice <= 3947.558
1  -1055.648 < UnitPrice <= 3947.558          0.0 < Quantity <= 16199.0

Upvotes: 1

Related Questions