Reputation: 123
I have a dataframe with 2 columns level1 and level2. Each account number in level1 is linked to a ParentID found in column level2. For accounts ending in "8409" in column "level1" some of them are mapped to the wrong ParentID in level2. To find its correct ParentID, you need to search in level1 where you replace all accounts that end in"8409" with "8400". This will then find its equivalent account in the same column. Where a match is found, copy what is in column "level2" and replace it under the column for the accounts ending in "8409".
import pandas as pd
import numpy as np
df = pd.DataFrame([[7854568400,489],
[9632588400,126],
[3699633691,189],
[9876543697,987],
[1111118409,987],
[7854568409,396],
[7854567893,897],
[9632588409,147]],
columns = ['level1','level2'])
df
The solution below allowed to create a new column "new_level2" to solve the above problem.
maps = df.set_index('level1')['level2']
s = df['level1'].astype(str).str.replace('8409$', '8400', regex=True).astype('int64')
df['new_level2'] = s.map(maps).combine_first(df['level2']).convert_dtypes()
In the output below, account "7854568409" had its level2 changed from 396 to 489 (taken from row 0), and account "9632588409" had its level2 changed from 147 to 126 (taken from row 1).
level1 level2 new_level2
0 7854568400 489 489
1 9632588400 126 126
2 3699633691 189 189
3 9876543697 987 987
4 1111118409 987 987
5 7854568409 396 489
6 7854567893 897 897
7 9632588409 147 126
However, when I apply the above solution with to additional variables this is where I run into problems mainly when I add the currency to the dataframe. The replacing of level2 values ONLY applies to USd only, all other currencies need to retain its current value in column level2.
Old DF
df = pd.DataFrame([['USD',7854568400,489],
['USD',9632588400,126],
['USD',3699633691,189],
['USD',9876543697,987],
['EUR',1111118409,987],
['USD',1111118409,987],
['USD',7854568409,396],
['USD',7854567893,897],
['USD',9632588409,147]],
columns = ['cur','level1','level2'])
df
Revised DF
df = pd.DataFrame([['USD',7854568400,489],
['USD',9632588400,126],
['USD',3699633691,189],
['USD',9876543697,987],
['EUR',1111118400,120],
['EUR',1111118409,987],
['USD',1111118409,987],
['USD',7854568409,396],
['USD',7854567893,897],
['USD',9632588409,147]],
columns = ['cur','level1','level2'])
I get the following error when i attempt to apply the above solution to the new dataframe that contains the currency.
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Its worth noting that you can have the same account number across different currencies.
Desired output is below. Only 2 accounts (9632588409 & 7854568409) had their level2 changed. Index 4 and 5 should retain its original level2 value because they are euro and not in scope and Index 6 because there was no corresponding match found for this account, therefore it retains it original value.
cur level1 level2 new_level2
0 USD 7854568400 489 489
1 USD 9632588400 126 126
2 USD 3699633691 189 189
3 USD 9876543697 987 987
4 EUR 1111118400 120 120
5 EUR 1111118409 987 987
6 USD 1111118409 987 987
7 USD 7854568409 396 489
8 USD 7854567893 897 897
9 USD 9632588409 147 126
Any help is greatly appreciated.
Upvotes: 0
Views: 104
Reputation: 14113
df = pd.DataFrame([['USD',7854568400,489],
['USD',9632588400,126],
['USD',3699633691,189],
['USD',9876543697,987],
['EUR',1111118400,120],
['EUR',1111118409,987],
['USD',1111118409,987],
['USD',7854568409,396],
['USD',7854567893,897],
['USD',9632588409,147]],
columns = ['cur','level1','level2'])
df['level1'] = df['level1'].astype(str).str.replace('8409$', '8400', regex=True).astype('int64')
df['new_col'] = df.where(df['cur'] == 'USD').groupby(['level1', 'cur'])['level2']\
.transform('first').fillna(df['level2']).astype(int)
print(df)
cur level1 level2 new_col
0 USD 7854568400 489 489
1 USD 9632588400 126 126
2 USD 3699633691 189 189
3 USD 9876543697 987 987
4 EUR 1111118400 120 120
5 EUR 1111118400 987 987
6 USD 1111118400 987 987
7 USD 7854568400 396 489
8 USD 7854567893 897 897
9 USD 9632588400 147 126
Upvotes: 1