Reputation: 366
I have the following dataframe:
df = pd.DataFrame({'ISIN': ['A1kT23', 'A1kT23', 'B333', '49O33'],
'Name': ['Example A', 'Name Xy', 'Example B', 'Test123'],
'Sector': ['Energy', 'Energy', 'Utilities', 'Real Estate'],
'Value': [100, 50, 7, 23]})
ISIN Name Sector Value
0 A1kT23 Example A Energy 100
1 A1kT23 Name Xy Energy 50
2 B333 Example B Utilities 7
3 49O33 Test123 Real Estate 23
I would love to sum up by ISIN. The problem is the name is not unique. I would love to use the first name (Example A) instead of the second (Name Xy), when I sum up, so that I get the following result:
df = pd.DataFrame({'ISIN': ['A1kT23', 'B333', '49O33'],
'Name': ['Example A', 'Example B', 'Test123'],
'Sector': ['Energy', 'Utilities', 'Real Estate'],
'Value': [150, 7, 23]})
ISIN Name Sector Value
0 A1kT23 Example A Energy 150
1 B333 Example B Utilities 7
2 49O33 Test123 Real Estate 23
How can I solve this?
Upvotes: 0
Views: 37
Reputation: 35636
Let's try with groupby aggregate
and keep the 'first'
value for Name
and Sector
and 'sum'
Value
:
df = df.groupby('ISIN', as_index=False, sort=False).agg({
'Name': 'first', 'Sector': 'first', 'Value': 'sum'
})
df
:
ISIN Name Sector Value
0 A1kT23 Example A Energy 150
1 B333 Example B Utilities 7
2 49O33 Test123 Real Estate 23
Upvotes: 2