Join/Merge for finding specific values with Panda in python

Question

I want to define parent and child relationship between 10,000 rows of csv files with Panda in python. I have below values and I just want to add a new columns as Parent and Child:

If the Name consists of Captain, it will be the parent. ( the code is perfectly work in this regard.)
Now for defining child I use join, If the Team of parent is equal to any other row that row will be the child . I update my code as below: and from the first code I have got the error which has been determined. and for the second code the rows are duplicated and wrong.

CSV File :

Name                TEAM
Captain                A
replace-Captain1       A
replace-Captain2       -
replace-Captain3       -
Captain                B
replace-Captain4       B
replace-Captain5       B
 ...                   …

Expected CSV:

Parent          ParentTEAM        Child                 ChildTEAM
Captain            A           replace-Captain1             A
Captain            B           replace-Captain4             B
Captain            B           replace-Captain5             B

First Code:

*

import pandas as pd
import numpy as np
df = pd.read_csv("C:\Users\Desktop\FamilyMembers.csv")
df["Parent"] = np.where(df['Name'].str.contains('Captain'), df['Name'], "-")
if pd.isnull(df["TEAM"]):
    df = df.join(df.set_index(['TEAM'])['Name'].rename('Child'), on=['TEAM'])
print(df)
df.to_csv('C:\Users\Desktop\FamilyMembers-Final.csv')

Error: Exception has occurred: ValueError The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Second Code:

import pandas as pd
import numpy as np

df = pd.read_csv("C:\Users\Desktop\FamilyMembers.csv")
df["Parent"] = np.where(df['Name'].str.contains('Captain'), df['Name'], "-")

df = df.join(df.set_index(['TEAM'])['Name'].rename('Child'), on=['TEAM'])
print(df)
df.to_csv('C:\Users\Desktop\FamilyMembers-Final.csv')

Result of second code:

    Parent                 ParentTEAM      Child                 ChildTEAM
    Captain                   A           Captain                   A
    Captain                   A           replace-Captain1          A
replace-Captain1              A           replace-Captain1          A
replace-Captain1              A           replace-Captain1          A
    Captain                   B           Captain                   B

    Captain                   B           replace-Captain4          B

replace-Captain4              B           replace-Captain1          B

replace-Captain4              B           replace-Captain4          B

avloss · Accepted Answer

It's actually not very clear what you are trying to do, perhaps this:


>>> parent = df[df.Name=='Captain']
>>> parent.columns = ['Parent','ParentTEAM']
>>> parent
    Parent ParentTEAM
0  Captain          A
4  Captain          B

>>> child = df[df.Name.str.contains('-') & ~(df.TEAM=='-')]
>>> child.columns = ['Child', 'ChildTEAM']
>>> child
              Child ChildTEAM
1  replace-Captain1         A
5  replace-Captain4         B
6  replace-Captain5         B

>>> pd.merge(parent, child, left_on='ParentTEAM', right_on='ChildTEAM')
    Parent ParentTEAM             Child ChildTEAM
0  Captain          A  replace-Captain1         A
1  Captain          B  replace-Captain4         B
2  Captain          B  replace-Captain5         B

Join/Merge for finding specific values with Panda in python

Answers (2)

Related Questions