Reputation: 1190
Please note this is just an example, there are more columns in the example and the list ends up being very big, hence I don't want to iterate it twice
Having:
import pandas as pd
import numpy as np
data = pd.DataFrame({'Name':['Peter','Peter','Anna','Anna','Anna'],
'Country1':['Italy',np.nan,np.nan,'Sweden',np.nan],
'Country2':[np.nan,'Venezuela',np.nan,'Peru','Iceland'],
'Price':[12,33,45,6,9]})
I do
data_g_name = data.groupby('Name')
country_cols=['Country1','Country2']
g_stats = pd.DataFrame({
'Countries':data_g_name['Country1','Country2'].apply(lambda x:x.values.flatten().tolist()),
'TotalCost' : data_g_name['Price'].sum()
})
And obtain:
' Name Countries TotalCost
0 Anna [nan, nan, Sweden, Peru, nan, Iceland] 60
1 Peter [Italy, nan, nan, Venezuela] 45'
I would like (without having to iterate through the list if possible, real case list is big):
Name Countries TotalCost
0 Anna [Sweden,Peru,Iceland] 60
1 Peter [Italy,Venezuela] 45
Upvotes: 0
Views: 105
Reputation: 120391
Use melt
to unpivot dataframe, drop all row with NaN
in column 'Country', group by 'Name' and convert to list then join the sum of 'Price':
>>> df.melt(['Name', 'Price'], value_name='Country') \
.dropna(subset=['Country']).groupby('Name')['Country'] \
.apply(list).to_frame() \
.join(df.groupby('Name')['Price'].sum().rename('TotalCost'))
Country TotalCost
Name
Anna [Sweden, Peru, Iceland] 60
Peter [Italy, Venezuela] 45
Upvotes: 1
Reputation: 214917
Aggregate country
and cost
separately and then combine results:
cost = data.Price.groupby(data.Name).sum().rename('TotalCost')
countries = (
data.melt('Name', ['Country1', 'Country2'], value_name='Countries')
.dropna()
.groupby('Name')
.Countries
.agg(list))
pd.concat([countries, cost], 1).reset_index()
# Name Countries TotalCost
#0 Anna [Sweden, Peru, Iceland] 60
#1 Peter [Italy, Venezuela] 45
Upvotes: 1