Reputation: 11
Hi all thanks for the time to help me on this,
Right now I have uploaded a csv into spark and the type of the dataframe is pyspark.sql.dataframe.DataFrame
I have a column of numbers (that are strings in this case though). They are numbers like 6,000
and I just want to remove all the commas from these numbers. I have tried df.select("col").replace(',' , '')
and df.withColumn('col', regexp_replace('col', ',' , '')
but seem to be getting an error that "DataFrame Object does not support item assignment"
Any ideas? I'm fairly new to Spark
Upvotes: 1
Views: 1523
Reputation: 536
You should be casting it really:
from pyspark.sql.types import IntegerType
df = df.withColumn("col", df["col"].cast(IntegerType()))
Upvotes: 1