Reputation: 632
How I get the average of Price for each combination of 2 columns?
My DataFrame:
relevantTable = df.select(df['Price'], df['B'], df['A'])
looks like:
+-------+------------+------------------+
| Price| B | A |
+-------+------------+------------------+
| 0.2947| i3.xlarge| x|
| 0.105| c4.large| x|
| 0.2179| m4.xlarge| x|
| 2.2534| m4.10xlarge| x|
| 2.1801| m4.10xlarge| x|
| 0.108| r4.large| x|
| 0.108| r4.large| x|
| 0.0213| i3.large| y|
| 0.5572| i2.4xlarge| y|
| 0.1542| c4.4xlarge| y|
| 0.3624| m4.10xlarge| y|
| 0.3596| m4.10xlarge| y|
| 0.11| m4.large| x|
| 0.4436| m4.2xlarge| x|
| 0.1458| m4.2xlarge| y|
... and so on real huge set
What would be a simple and scalable solution to get the average for all combinations of A and B?
Upvotes: 0
Views: 1245
Reputation: 36
How about:
df.groupBy("A", "B").avg("Price")
or if you want to include aggregates by single column:
df.cube("A", "B").avg("Price")
Upvotes: 2