Importing data with Pyspark : Wrong datatype

Question

I have a problem with Pyspark : when I import my Dataset with Pyspark, all my columns are considered a string, even if my columns are numeric.

I don't have this probleme when I import data with Pandas.

I'm actually using a platform to devlop : Dataiku. the data are already on the platform and I import them with this code :

# Example: Read the descriptor of a Dataiku dataset
mydataset = 
dataiku.Dataset("Extracts___Retail_Master_Data___Product_Hierarchy_HDFS")
# And read it as a Spark dataframe
df = dkuspark.get_dataframe(sqlContext, mydataset)

I can't find a way to import my data into the correct format.

Thanks.

Importing data with Pyspark : Wrong datatype

Answers (1)

Related Questions