Reputation: 5701
I'd like to see only n digits (e.g., 3) of the floating point numbers in PySpark. Is there a way to set the default? Note that I don't want to round the actual data.
The followings show what I have and it is too much info:
>>> df = sc.parallelize( [ ('a', 1), ('b', 2) ] ).toDF()
>>> df.withColumn("x", rand()).show()
+---+---+------------------+
| _1| _2| x|
+---+---+------------------+
| a| 1|0.7468471761178085|
| b| 2|0.6189219219244186|
+---+---+------------------+
Thanks!
Upvotes: 0
Views: 211
Reputation: 10086
If it's only for display, you can use pandas and specify a float format:
pd.options.display.float_format = '{:,.2f}'.format
df = sc.parallelize( [ ('a', 1), ('b', 2) ] ).toDF()
df.withColumn("x", rand()).limit(20).toPandas()
Upvotes: 1