Reputation: 386
I have the following dataframe
p = {'parentId':['071cb2c2-d1be-4154-b6c7-a29728357ef3', 'a061e7d7-95d2-4812-87c1-24ec24fc2dd2', 'Highest Level', '071cb2c2-d1be-4154-b6c7-a29728357ef3'],
'id_x': ['a061e7d7-95d2-4812-87c1-24ec24fc2dd2', 'd2b62e36-b243-43ac-8e45-ed3f269d50b2', '071cb2c2-d1be-4154-b6c7-a29728357ef3', 'a0e97b37-b9a1-4304-9769-b8c48cd9f184'],
'type': ['Department', 'Department', 'Department', 'Function'], 'name': ['Sales', 'Finances', 'Management', 'Manager']}
df = pd.DataFrame(data = p)
df
| parentId | id_x | type | name |
| ------------------------------------ | ------------------------------------ | ---------- | ---------- |
| 071cb2c2-d1be-4154-b6c7-a29728357ef3 | a061e7d7-95d2-4812-87c1-24ec24fc2dd2 | Department | Sales |
| a061e7d7-95d2-4812-87c1-24ec24fc2dd2 | d2b62e36-b243-43ac-8e45-ed3f269d50b2 | Department | Finances |
| Highest Level | 071cb2c2-d1be-4154-b6c7-a29728357ef3 | Department | Management |
| 071cb2c2-d1be-4154-b6c7-a29728357ef3 | a0e97b37-b9a1-4304-9769-b8c48cd9f184 | Function | Manager |
I tried to create a function that should return the name
of the corresponding entry, where the parentId
is the id_x
and put it in a new column. With the function I get the following result:
def allocator(id_x, parent_ID, name):
d = "no sub-dependency"
for node in id_x:
if node == parent_ID:
d = name
return d
df['Parent_name'] = df.apply(lambda x: allocator(df['id_x'], x['parentId'], x['name']), axis=1)
df
| parentId | id_x | type | name | Parent_name |
| ------------------------------------ | ------------------------------------ | ---------- | ---------- | ----------------- |
| 071cb2c2-d1be-4154-b6c7-a29728357ef3 | a061e7d7-95d2-4812-87c1-24ec24fc2dd2 | Department | Sales | Sales |
| a061e7d7-95d2-4812-87c1-24ec24fc2dd2 | d2b62e36-b243-43ac-8e45-ed3f269d50b2 | Department | Finances | Finances |
| Highest Level | 071cb2c2-d1be-4154-b6c7-a29728357ef3 | Department | Management | no sub-dependency |
| 071cb2c2-d1be-4154-b6c7-a29728357ef3 | a0e97b37-b9a1-4304-9769-b8c48cd9f184 | Function | Manager | Manager |
The function up to now only puts in the name of the corresponding id_x
itself. However, it should take the name
of the entry where the parentId
is the id_x
.
| parentId | id_x | type | name | Parent_name |
| ------------------------------------ | ------------------------------------ | ---------- | ---------- | ----------------- |
| 071cb2c2-d1be-4154-b6c7-a29728357ef3 | a061e7d7-95d2-4812-87c1-24ec24fc2dd2 | Department | Sales | Management |
| a061e7d7-95d2-4812-87c1-24ec24fc2dd2 | d2b62e36-b243-43ac-8e45-ed3f269d50b2 | Department | Finances | Sales |
| Highest Level | 071cb2c2-d1be-4154-b6c7-a29728357ef3 | Department | Management | no sub-dependency |
| 071cb2c2-d1be-4154-b6c7-a29728357ef3 | a0e97b37-b9a1-4304-9769-b8c48cd9f184 | Function | Manager | Management |
How do I have to change the function, so it takes the name
of the related parent entry?
Upvotes: 1
Views: 195
Reputation: 195573
You can use .map()
:
mapping = dict(zip(df["id_x"], df["name"]))
df["Parent_name"] = df["parentId"].map(mapping).fillna("no sub-dependency")
print(df)
Prints:
parentId id_x type name Parent_name
0 071cb2c2-d1be-4154-b6c7-a29728357ef3 a061e7d7-95d2-4812-87c1-24ec24fc2dd2 Department Sales Management
1 a061e7d7-95d2-4812-87c1-24ec24fc2dd2 d2b62e36-b243-43ac-8e45-ed3f269d50b2 Department Finances Sales
2 Highest Level 071cb2c2-d1be-4154-b6c7-a29728357ef3 Department Management no sub-dependency
3 071cb2c2-d1be-4154-b6c7-a29728357ef3 a0e97b37-b9a1-4304-9769-b8c48cd9f184 Function Manager Management
Upvotes: 1