Reputation: 51
I was learning Marketing analystics and stuck on the following snippet, namely operation var. How 'sum' and 'firs' can give sum of a column and 'first' first unique from a column?
operations = {'revenue':'sum',
'InvoiceDate':'first',
'CustomerID':'first'}
df = df.groupby('InvoiceNo').agg(operations)
I was thinking to relate it to pandas.Series.first and pandas.Series.sum but could not find examples.
Book explanation: In the preceding code snippet, we first specified the aggregation functions that we will use for each column, and then performed groupby and applied those functions. InvoiceDate and CustomerID will be the same for all rows for the same invoice, so we can just take the first entry for them. For revenue, we sum the revenue across all items for the same invoice to get the total revenue for that invoice.
Result:
revenue InvoiceDate CustomerID
InvoiceNo
581583 124.60 2011-12-09 12:23:00 13777.0
581584 140.64 2011-12-09 12:25:00 13777.0
581585 329.05 2011-12-09 12:31:00 15804.0
581586 339.20 2011-12-09 12:49:00 13113.0
581587 249.45 2011-12-09 12:50:00 12680.0
Upvotes: 1
Views: 41
Reputation: 862511
Here is used function GroupBy.first
, not Series.first
for first value per groups with dictionary for columns names with aggregate functions.
Upvotes: 1