Reputation: 115
I'm having trouble getting a string in pandas to be replaced in the correct manner. I'm not sure if I'm limited to using pandas and there might not be a way to do this with solely using pandas.
This is how my dataframe looks:
(ID: 10) 247333605 0.0
(ID: 20) 36738870 0.0
(ID: 40) 4668036427 0.0
(ID: 50) 1918647972 0.0
(ID: 60) 4323165902 44125.0
(ID: 80) 145512255 0.0
Assigned (ID: 30) 42050340 0.0
Assigned (ID: 40) 130880371376 0.0
Assigning (ID: 30) 1095844753 0.0
Cancelled (ID: 40) 937280 0.0
Cancelled (ID: 80) 16857720813 0.0
Planned (ID: 20) 9060392597 0.0
Planning (ID: 10) 108484297031 0.0
Processed (ID: 70) 133289880880 0.0
Revoked (ID: 50) 2411903072 0.0
Writing (ID: 50) 146408550024 0.0
Written (ID: 60) 139458227923 1018230.0
For each (ID: x), it should be matched to the assigned (ID: x), cancelled (ID: x), etc with the correct ID.
Using lines similar to this line:
input_data['last_status'] = input_data.last_status.str.replace('(ID: 10)', 'Planning (ID: 10)')
My output is:
(Assigned (ID: 40)) 0.0
(Cancelled (ID: 80)) 0.0
(Planned (ID: 20)) 0.0
(Planning (ID: 10)) 0.0
(Writing (ID: 50)) 0.0
(Written (ID: 60)) 44125.0
Assigned (Assigned (ID: 40)) 0.0
Assigned (ID: 30) 0.0
Assigning (ID: 30) 0.0
Cancelled (Assigned (ID: 40)) 0.0
Cancelled (Cancelled (ID: 80)) 0.0
Planned (Planned (ID: 20)) 0.0
Planning (Planning (ID: 10)) 0.0
Processed (ID: 70) 0.0
Revoked (Writing (ID: 50)) 0.0
Writing (Writing (ID: 50)) 0.0
Written (Written (ID: 60)) 1018230.0
As you can see, all the (ID: x) got replaced and it still doesn't match the correct term.
My ideal dataframe would look like this:
Assigned (ID: 30) 42050340 0.0
Assigned (ID: 40) 130880371376 0.0
Assigning (ID: 30) 1095844753 0.0
Cancelled (ID: 40) 937280 0.0
Cancelled (ID: 80) 16857720813 0.0
Planned (ID: 20) 9060392597 0.0
Planning (ID: 10) 108484297031 0.0
Processed (ID: 70) 133289880880 0.0
Revoked (ID: 50) 2411903072 0.0
Writing (ID: 50) 146408550024 0.0
Written (ID: 60) 139458227923 1018230.0
I'm bound to using pandas because the dataset is huge, I have a different implementation but they take me days to run. Is there a way to do this right in pandas?
I've never asked something before on stackoverflow. I hope my question is clear.
Upvotes: 3
Views: 5063
Reputation: 402922
If you want to generalise, you can use str.replace
with SOL/EOL anchors.
df['last_status'].str.replace(r'^(\(ID: \d+\))$', r'Planning: \1')
0 Planning: (ID: 10)
1 Planning: (ID: 20)
2 Planning: (ID: 40)
3 Planning: (ID: 50)
4 Planning: (ID: 60)
5 Planning: (ID: 80)
6 Assigned (ID: 30)
7 Assigned (ID: 40)
8 Assigning (ID: 30)
9 Cancelled (ID: 40)
10 Cancelled (ID: 80)
11 Planned (ID: 20)
12 Planning (ID: 10)
13 Processed (ID: 70)
14 Revoked (ID: 50)
15 Writing (ID: 50)
16 Written (ID: 60)
Name: last_status, dtype: object
If you want to replace a specific ID only, change your regex to -
r'^(\(ID: 10\))$'
Or,
r'^(\(ID: {}\))$'.format(number)
Where number
is a variable that holds the ID value to perform replacement on.
Upvotes: 3