Sara Daniel
Sara Daniel

Reputation: 165

Join/Merge for finding specific values with Panda in python

I want to define parent and child relationship between 10,000 rows of csv files with Panda in python. I have below values and I just want to add a new columns as Parent and Child:

CSV File :

Name                TEAM
Captain                A
replace-Captain1       A
replace-Captain2       -
replace-Captain3       -
Captain                B
replace-Captain4       B
replace-Captain5       B
 ...                   …

Expected CSV:

Parent          ParentTEAM        Child                 ChildTEAM
Captain            A           replace-Captain1             A
Captain            B           replace-Captain4             B
Captain            B           replace-Captain5             B

First Code:

*

import pandas as pd
import numpy as np
df = pd.read_csv("C:\\Users\\Desktop\\FamilyMembers.csv")
df["Parent"] = np.where(df['Name'].str.contains('Captain'), df['Name'], "-")
if pd.isnull(df["TEAM"]):
    df = df.join(df.set_index(['TEAM'])['Name'].rename('Child'), on=['TEAM'])
print(df)
df.to_csv('C:\\Users\\Desktop\\FamilyMembers-Final.csv')
Error: Exception has occurred: ValueError The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Second Code:

import pandas as pd
import numpy as np

df = pd.read_csv("C:\\Users\\Desktop\\FamilyMembers.csv")
df["Parent"] = np.where(df['Name'].str.contains('Captain'), df['Name'], "-")

df = df.join(df.set_index(['TEAM'])['Name'].rename('Child'), on=['TEAM'])
print(df)
df.to_csv('C:\\Users\\Desktop\\FamilyMembers-Final.csv')

Result of second code:

    Parent                 ParentTEAM      Child                 ChildTEAM
    Captain                   A           Captain                   A
    Captain                   A           replace-Captain1          A
replace-Captain1              A           replace-Captain1          A
replace-Captain1              A           replace-Captain1          A
    Captain                   B           Captain                   B

    Captain                   B           replace-Captain4          B

replace-Captain4              B           replace-Captain1          B

replace-Captain4              B           replace-Captain4          B

Upvotes: 1

Views: 75

Answers (2)

Sara Daniel
Sara Daniel

Reputation: 165

Ok , I found out the answer:

Parent= df[df['Name'].str.contains('Captain')]
print(Parent)
Child= df[df['Name'].str.contains('Captain')]
print(Child)
result = pd.merge(Parent, Child, left_on='TEAM', right_on='TEAM')

Upvotes: 0

avloss
avloss

Reputation: 2646

It's actually not very clear what you are trying to do, perhaps this:


>>> parent = df[df.Name=='Captain']
>>> parent.columns = ['Parent','ParentTEAM']
>>> parent
    Parent ParentTEAM
0  Captain          A
4  Captain          B

>>> child = df[df.Name.str.contains('-') & ~(df.TEAM=='-')]
>>> child.columns = ['Child', 'ChildTEAM']
>>> child
              Child ChildTEAM
1  replace-Captain1         A
5  replace-Captain4         B
6  replace-Captain5         B

>>> pd.merge(parent, child, left_on='ParentTEAM', right_on='ChildTEAM')
    Parent ParentTEAM             Child ChildTEAM
0  Captain          A  replace-Captain1         A
1  Captain          B  replace-Captain4         B
2  Captain          B  replace-Captain5         B

Upvotes: 1

Related Questions