Convert the string from K to thousands etc., and remove space between strings in pyspark dataframe

Question

I'd like to ask your help for transforming strings in one column in Pyspark dataframe.

For example, I have a dataframe named "df" which has the following structure.

df = spark.createDataFrame([('David','5K'),('William','6M'),('Sam','1B'),('Ashely','342 1'),('Chloe','240.5 4')], ['Name','Numbers'])

where K = thousands, M = millions, and B = billions.

What I want to do here is (1) converting the string from K to thousands, M to millions, B to billions, and (2) removing the spaces between strings in the "Numbers" column, and then (3) changing its datatype to double. You can use regular expressions but I don't want to use pandas in this case.

This would be the desired output after transformation:

Names    |Numbers
---------------------------
David    |5000
William  |6000000
Sam      |1000000000
Ashely   |3421
Chloe    |240.54

I'd appreciate any helps from you guys!

Convert the string from K to thousands etc., and remove space between strings in pyspark dataframe

Answers (1)

Related Questions