Reputation: 93
I have a collection, named "Vendite", with some fields on my MongoDB's database. Just to make everything easier, I'll take one field (named "ven_lordo") as example but I have this problem with a lot of them.
If I use the following command on mongosh I get as fieldType for each document: "double".
db.Vendite.aggregate([{ "$project": { "fieldType": { "$type": "$ven_lordo" }}}])
When I read data from MongoDB using Pyspark I use the following code:
df = sparkSession.read.format("mongodb")\
.option("spark.mongodb.read.database", db)\
.option("spark.mongodb.read.collection", collection)\
.option("spark.mongodb.read.connection.uri", uri)\
.load()
print(df.schema["ven_lordo"].dataType)
df.createOrReplaceTempView("df")
print(df.schema["ven_lordo"].dataType)
The print I get is "StringType" in both cases (before and after the temp view).
How can i read data using the same type they have in Mongo?
I would like to add a particular note: if I apply a sum function to two columns with this problem i get a valid result in .show() function, i.e. it shows me a number (I don't know if string or double) and the result is correct too. Is it normal that the sum works with two strings?
Upvotes: 0
Views: 26