Reputation: 562
I'm currently working on a problem that involves changing the types of several Columns
in a DataFrame
, but I'm not sure how can I pass it into a udf
because the function I have created takes a dictionary
as an argument and therefore, I don't know how to pass the function into a udf
.
All the data types I currently have are of type String
, but as I mentioned, I need to change them to different types such as Integer & Date
.
my function looks something like that:
def columns_types_transformer(df, reformating_dict):
for column, new_type in reformating_dict.items():
df = df.withColumn(column, df[column].cast(new_type))
return df
the dictionary I want to pass looks like that:
dictionary = {'date1': DateType(), 'date2': DateType(), 'date3': DateType(), 'date4': DateType(), 'date5': DateType(), 'date6': DateType(), 'integer1': IntegerType()}
My issue here is how to pass the dictionary with the correct types into a udf? Another approach I was thinking of is using SQLTransformer for it, but also not sure how can this be done.
Any help would be appreciated.
Upvotes: 0
Views: 769
Reputation: 562
I managed to solve this issue using the SQLTransformer.
This is what I have done
sqlTrans_formatter = SQLTransformer(statement="SELECT CAST(date1 AS date), CAST(date2 AS date), CAST(date3 AS date), CAST(date4 AS date), CAST(date5 AS date), CAST(date6 AS date), CAST(integer1 AS int) FROM __THIS__")
df = sqlTrans_formatter.transform(ddf)
Hopefully it would be helpful for others as well.
Upvotes: 1