Tilo
Tilo

Reputation: 429

casting to string of column for pyspark dataframe throws error

I have pyspark dataframe with two columns with datatypes as

[('area', 'int'), ('customer_play_id', 'int')]

+----+----------------+
|area|customer_play_id|
+----+----------------+
| 100|        8606738 |
| 110|        8601843 |
| 130|        8602984 |
+----+----------------+

I want to cast column area to str using pyspark commands but I am getting error as below

I tried below

  1. str(df['area']) : but it didnt change datatype to str
  2. df.area.astype(str) : gave "TypeError: unexpected type: "
  3. df['area'].cast(str) same as error above

Any help will be appreciated I want datatype of area as string using pyspark dataframe operation

Upvotes: 0

Views: 6885

Answers (3)

Anuj Gupta
Anuj Gupta

Reputation: 25

You Can use this UDF Function

from pyspark.sql.types import FloatType
tofloatfunc = udf(lambda x: x,FloatType())
changedTypedf = df.withColumn("Column_name", df["Column_name"].cast(FloatType()))

Upvotes: 0

Shantanu Sharma
Shantanu Sharma

Reputation: 4099

Simply you can do any of these -

Option1:

df1 = df.select('*',df.area.cast("string"))

select - All the columns you want in df1 should be mentioned in select

Option2:

df1 = df.selectExpr("*","cast(area as string) AS new_area")

selectExpr - All the columns you want in df1 should be mentioned in selectExpr

Option3:

df1 = df.withColumn("new_area", df.area.cast("string"))

withColumn will add new column (additional to existing columns of df)

"*" in select and selectExpr represent all the columns.

Upvotes: 1

Ankit Kumar Namdeo
Ankit Kumar Namdeo

Reputation: 1464

use withColumn function to change the data type or values in the field in spark e.g. is show below:

import pyspark.sql.functions as F
df = df.withColumn("area",F.col("area").cast("string"))

Upvotes: 1

Related Questions