Reputation: 53
Hello guys I need your wisdom,
I'm still new to python and pandas and I'm looking to achieve the following thing.
df = pd.DataFrame({'code': [125, 265, 128,368,4682,12,26,12,36,46,1,2,1,3,6], 'parent': [12,26,12,36,46,1,2,1,3,6,'a','b','a','c','f'], 'name':['unknow','unknow','unknow','unknow','unknow','unknow','unknow','unknow','unknow','unknow','g1','g2','g1','g3','g6']})
ds = pd.DataFrame({'code': [125, 265, 128,368,4682], 'name': ['Eagle','Cat','Koala','Panther','Dophin']})
I would like to add a new column in the ds
dataframe with the name of the highest parent.
as an example for the first row :
code | name | category
125 | Eagle | a
"a" is the result of a loop between df.code
and df.parent
125 > 12 > 1 > a
Since the last parent is not a number but a letter i think I must use a regex and than .merge from pandas to populate the ds['category']
column. Also maybe use an apply function but it seems a little bit above my current knowledge.
Could anyone help me with this?
Regards,
Upvotes: 0
Views: 66
Reputation: 30579
The following is certainly not the fastest solution but it works if your dataframes are not too big. First create a dictionary from the parent codes of df
and then apply this dict recursively until you come to an end.
p = df[['code','parent']].set_index('code').to_dict()['parent']
def get_parent(code):
while par := p.get(code):
code = par
return code
ds['category'] = ds.code.apply(get_parent)
Result:
code name category
0 125 Eagle a
1 265 Cat b
2 128 Koala a
3 368 Panther c
4 4682 Dophin f
PS: get_parent
uses an assignment expression (Python >= 3.8), for older versions of Python you could use:
def get_parent(code):
while True:
par = p.get(code)
if par:
code = par
else:
return code
Upvotes: 1