Reputation: 483
Suppose that I have a dataframe in pyspark as follows:
+---------+---------+
| col1 | col2 |
+---------+---------+
|3.34567e4| 45876549|
+---------+---------+
|4.4781e8 | 7856549 |
+---------+---------+
I want to keep col1
in scientific notation but show the number with 2 decimal place. I also want to change col2
to scientific format. So the result should be as follows:
+---------+---------+
| col1 | col2 |
+---------+---------+
| 3.35e4 | 4.59e7 |
+---------+---------+
| 4.48e8 | 7.86e6 |
+---------+---------+
I searched a lot but I haven't found any answer.
Upvotes: 0
Views: 2392
Reputation: 43544
You can use pyspark.sql.functions.format_string
, which allows you to apply a printf
style format to display the results.
In this case, you can use the format string "%.2e"
to format a floating point number in exponential (scientific) notation with 2 decimal points.
For example:
from pyspark.sql.functions import col, format_string
df.select(*[format_string("%.2e", col(c).cast("float")).alias(c) for c in df.columns]).show()
#+--------+--------+
#| col1| col2|
#+--------+--------+
#|3.35e+04|4.59e+07|
#|4.48e+08|7.86e+06|
#+--------+--------+
Be aware that the resulting column is a string (and not a number).
Upvotes: 1