Reputation: 1232
I hope you can't help.
I have this dataframe, and I want to select, for example, the count of the prediction==4
Code:
the_counts=df.select('prediction').groupby('prediction').count()
the_counts.show()
+----------+-----+
|prediction|count|
+----------+-----+
| 1| 8|
| 6| 14|
| 5| 5|
| 4| 8|
| 8| 5|
| 0| 6|
+----------+-----+
So, I can assign that value to a variable. As this will be within a loop that will run many iterations.
I managed this, but it's by creating a different dataframe, and then changing that datafram to a number.
dfva = the_counts.select('count').filter(the_counts.prediction ==6)
dfva.show()
+-----+
|count|
+-----+
| 14|
+-----+
Is there a way to access the number straight away without so many steps, or the most efficient way?
This is python 3.x and spark 2.1
Thank you very much
Upvotes: 0
Views: 215
Reputation: 5870
you can first() method to take the value directly,
>>> dfva = the_counts.filter(the_counts['prediction'] == 6).first()['count']
>>> type(dfva)
<type 'int'>
>>> print(dfva)
14
Upvotes: 2