Reputation: 129
I have a dataframe like:
Name_Index City_Index
2.0 1.0
0.0 2.0
1.0 0.0
I have a new list of values.
list(1.0,1.0)
I want to add these values to a new row in dataframe in the case that all previous rows are dropped.
My code:
val spark = SparkSession.builder
.master("local[*]")
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.getOrCreate()
var data = spark.read.option("header", "true")
.option("inferSchema", "true")
.csv("src/main/resources/student.csv")
val someDF = Seq(
(1.0,1.0)
).toDF("Name_Index","City_Index")
data=data.union(someDF).show()
It show output like:
Name_Index City_Index
2.0 1.0
0.0 2.0
1.0 0.0
1.1 1.1
But output should be like this. So that all the previous rows are dropped and new values are added.
Name_Index City_Index
1.0 1.0
Upvotes: 2
Views: 1884
Reputation: 606
As far as I can see, you only need the list of columns from source Dataframe.
If your sequence has the same order of the columns as the source Dataframe does, you can re-use schema without actually querying the source Dataframe. Performance wise, it will be faster.
val srcDf = Seq((2.0,1.0),(0.0,2.0),(1.0,0.0)).toDF("name_index","city_index")
val dstDf = Seq((1.0, 1.0)).toDF(srcDf.columns:_*)
Upvotes: 0
Reputation: 2838
you could try this approach
data = data.filter(_ => false).union(someDF)
output
+----------+----------+
|Name_Index|City_Index|
+----------+----------+
|1.0 |1.0 |
+----------+----------+
I hope it gives you some insights.
Regards.
Upvotes: 0
Reputation: 10382
you can achieve this using limit & union functions. check below.
scala> val df = Seq((2.0,1.0),(0.0,2.0),(1.0,0.0)).toDF("name_index","city_index")
df: org.apache.spark.sql.DataFrame = [name_index: double, city_index: double]
scala> df.show(false)
+----------+----------+
|name_index|city_index|
+----------+----------+
|2.0 |1.0 |
|0.0 |2.0 |
|1.0 |0.0 |
+----------+----------+
scala> val ndf = Seq((1.0,1.0)).toDF("name_index","city_index")
ndf: org.apache.spark.sql.DataFrame = [name_index: double, city_index: double]
scala> ndf.show
+----------+----------+
|name_index|city_index|
+----------+----------+
| 1.0| 1.0|
+----------+----------+
scala> df.limit(0).union(ndf).show(false) // this is not good approach., you can directly call ndf.show
+----------+----------+
|name_index|city_index|
+----------+----------+
|1.0 |1.0 |
+----------+----------+
Upvotes: 1
Reputation: 4045
change the last line to
data=data.except(data).union(someDF).show()
Upvotes: 0