Arunanshu P
Arunanshu P

Reputation: 171

Change the Datatype of columns in PySpark dataframe

I have an input dataframe(ip_df), data in this dataframe looks like as below:

id            col_value
1               10
2               11
3               12

Data type of id and col_value is String

I need to get another dataframe(output_df), having datatype of id as string and col_value column as decimal**(15,4)**. THere is no data transformation, just data type conversion. Can i use it using PySpark. Any help will be appreciated

Upvotes: 10

Views: 32387

Answers (3)

Amit Pathak
Amit Pathak

Reputation: 1377

You can change multiple column types

  • Using withColumn() -
from pyspark.sql.types import DecimalType, StringType

output_df = ip_df \
  .withColumn("col_value", ip_df["col_value"].cast(DecimalType())) \
  .withColumn("id", ip_df["id"].cast(StringType()))
  • Using select()
from pyspark.sql.types import DecimalType, StringType

output_df = ip_df.select(
  (ip_df.id.cast(StringType())).alias('id'),
  (ip_df.col_value.cast(DecimalType())).alias('col_value')
)
  • Using spark.sql()
ip_df.createOrReplaceTempView("ip_df_view")

output_df = spark.sql('''
SELECT 
    STRING(id),
    DECIMAL(col_value)
FROM ip_df_view;
''')

Upvotes: 1

Neeraj Bhadani
Neeraj Bhadani

Reputation: 3110

try below statement.

output_df = ip_df.withColumn("col_value",ip_df["col_value"].cast('float'))

Upvotes: 5

aclowkay
aclowkay

Reputation: 3897

Try using the cast method:

from pyspark.sql.types import DecimalType
<your code>
output_df = ip_df.withColumn("col_value",ip_df["col_value"].cast(DecimalType()))

Upvotes: 12

Related Questions