Pyspark DataframeType error a: DoubleType can not accept object 'a' in type

Question

I have this function

customSchema = StructType([ \
    StructField("a", Doubletype(), True), \
    StructField("b", Doubletype(), True),
    StructField("c", Doubletype(), True), 
    StructField("d", Doubletype(), True)])


n_1= sc.textFile("/path/*.txt")\
        .mapPartitions(lambda partition: csv.reader([line.replace('\0','') for line in partition],delimiter=';', quotechar='"')).filter(lambda line: len(line) > 1 )\
        .toDF(customSchema)

which would create a Dataframe, the problem is that ' .mapPartitions' will use as default type and i need to cast it to DoubleType before convert it into Dataframe. Any idea?

Sample data

[['0,01', '344,01', '0,00', '0,00']]

or just work with

n_1= sc.textFile("/path/*.txt")\
        .mapPartitions(lambda partition: csv.reader([line.replace('\0','') for line in partition],delimiter=';', quotechar='"')).filter(lambda line: len(line) > 1 )\

Pyspark DataframeType error a: DoubleType can not accept object 'a' in type <class 'str'>

Answers (1)

Related Questions

Pyspark DataframeType error a: DoubleType can not accept object &#39;a&#39; in type &lt;class &#39;str&#39;&gt;

Answers (1)

Related Questions

Pyspark DataframeType error a: DoubleType can not accept object 'a' in type <class 'str'>