Monirrad
Monirrad

Reputation: 483

How to show a column in a PySpark dataframe in the scientific notation with proper format

Suppose that I have a dataframe in pyspark as follows:

+---------+---------+
|   col1  |  col2   |
+---------+---------+
|3.34567e4| 45876549| 
+---------+---------+
|4.4781e8 | 7856549 |
+---------+---------+

I want to keep col1 in scientific notation but show the number with 2 decimal place. I also want to change col2 to scientific format. So the result should be as follows:

+---------+---------+
|   col1  |  col2   |
+---------+---------+
|  3.35e4 |  4.59e7 | 
+---------+---------+
|  4.48e8 |  7.86e6 |
+---------+---------+

I searched a lot but I haven't found any answer.

Upvotes: 0

Views: 2392

Answers (1)

pault
pault

Reputation: 43544

You can use pyspark.sql.functions.format_string, which allows you to apply a printf style format to display the results.

In this case, you can use the format string "%.2e" to format a floating point number in exponential (scientific) notation with 2 decimal points.

For example:

from pyspark.sql.functions import col, format_string

df.select(*[format_string("%.2e", col(c).cast("float")).alias(c) for c in df.columns]).show()
#+--------+--------+
#|    col1|    col2|
#+--------+--------+
#|3.35e+04|4.59e+07|
#|4.48e+08|7.86e+06|
#+--------+--------+

Be aware that the resulting column is a string (and not a number).

Upvotes: 1

Related Questions