Reputation: 165
I want to define parent and child relationship between 10,000 rows of csv files with Panda in python. I have below values and I just want to add a new columns as Parent and Child:
CSV File :
Name TEAM
Captain A
replace-Captain1 A
replace-Captain2 -
replace-Captain3 -
Captain B
replace-Captain4 B
replace-Captain5 B
... …
Expected CSV:
Parent ParentTEAM Child ChildTEAM
Captain A replace-Captain1 A
Captain B replace-Captain4 B
Captain B replace-Captain5 B
First Code:
*
import pandas as pd
import numpy as np
df = pd.read_csv("C:\\Users\\Desktop\\FamilyMembers.csv")
df["Parent"] = np.where(df['Name'].str.contains('Captain'), df['Name'], "-")
if pd.isnull(df["TEAM"]):
df = df.join(df.set_index(['TEAM'])['Name'].rename('Child'), on=['TEAM'])
print(df)
df.to_csv('C:\\Users\\Desktop\\FamilyMembers-Final.csv')
Error: Exception has occurred: ValueError The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Second Code:
import pandas as pd
import numpy as np
df = pd.read_csv("C:\\Users\\Desktop\\FamilyMembers.csv")
df["Parent"] = np.where(df['Name'].str.contains('Captain'), df['Name'], "-")
df = df.join(df.set_index(['TEAM'])['Name'].rename('Child'), on=['TEAM'])
print(df)
df.to_csv('C:\\Users\\Desktop\\FamilyMembers-Final.csv')
Result of second code:
Parent ParentTEAM Child ChildTEAM
Captain A Captain A
Captain A replace-Captain1 A
replace-Captain1 A replace-Captain1 A
replace-Captain1 A replace-Captain1 A
Captain B Captain B
Captain B replace-Captain4 B
replace-Captain4 B replace-Captain1 B
replace-Captain4 B replace-Captain4 B
Upvotes: 1
Views: 75
Reputation: 165
Ok , I found out the answer:
Parent= df[df['Name'].str.contains('Captain')]
print(Parent)
Child= df[df['Name'].str.contains('Captain')]
print(Child)
result = pd.merge(Parent, Child, left_on='TEAM', right_on='TEAM')
Upvotes: 0
Reputation: 2646
It's actually not very clear what you are trying to do, perhaps this:
>>> parent = df[df.Name=='Captain']
>>> parent.columns = ['Parent','ParentTEAM']
>>> parent
Parent ParentTEAM
0 Captain A
4 Captain B
>>> child = df[df.Name.str.contains('-') & ~(df.TEAM=='-')]
>>> child.columns = ['Child', 'ChildTEAM']
>>> child
Child ChildTEAM
1 replace-Captain1 A
5 replace-Captain4 B
6 replace-Captain5 B
>>> pd.merge(parent, child, left_on='ParentTEAM', right_on='ChildTEAM')
Parent ParentTEAM Child ChildTEAM
0 Captain A replace-Captain1 A
1 Captain B replace-Captain4 B
2 Captain B replace-Captain5 B
Upvotes: 1