Reputation: 333
I am a beginner in spark and I got stuck in how to make a sql request using dataframe.
I have the two followings dataframe.
dataframe_1
+-----------------+-----------------+----------------------+---------------------+
|id |geometry_tyme |geometry |rayon |
+-----------------+-----------------+----------------------+---------------------+
|50 |Polygon |[00 00 00 00 01 0...] |200 |
|54 |Point |[00 00 00 00 01 0.. ] |320179 |
+-----------------+-----------------+----------------------+---------------------+
dataframe_2
+-----------------+-----------------+----------------------+
|id2 |long |lat |
+-----------------+-----------------+----------------------+
|[70,50,600,] | -9.6198783 |44.5942549 |
|[20,140,39,] |-6.6198783 |44.5942549 |
+-----------------+-----------------+----------------------+
I want to execute the following request.
"SELECT dataframe_1.* FROM dataframe_1 WHERE dataframe_1.id IN ("
+ id2
+ ") AND ((dataframe_1.geometry_tyme='Polygon' AND (ST_WITHIN(ST_GeomFromText(CONCAT('POINT(',"
+ long
+ ",' ',"
+ lat
+ ",')'),4326),dataframe_1.geometry))) OR ( (dataframe_1.geometry_tyme='LineString' OR dataframe_1.geomType='Point') AND ST_Intersects(ST_buffer(dataframe_1.geom,(dataframe_1.rayon/100000)),ST_GeomFromText(CONCAT('POINT(',"
+ long
+ ",' ',"
+ lat
+ ",')'),4326)))) "
I'm really stuck, should I join the two data frames or what? I tried to join the two dataframes with id and idZone like this :
dataframe_2.select(explode(col("id2").as ("id2"))).join(dataframe_1,col("id2").equalTo(dataframe_1.col("id")));
but it seems to me that making a join is not the right choice.
I need you help .
Thank you
Upvotes: 2
Views: 563
Reputation: 780
1.create temp views from your dataframes.
dataframe_1.createOrReplaceTempView("dataframe_1")
dataframe_2.createOrReplaceTempViews("dataframe_2")
2.Run your SQL as final_df = spark.sql("your SQL here")
Upvotes: 1