Advika
Advika

Reputation: 595

how to update a value in dataframe and drop a row on this basis of a given value in scala

I need to update the value and if the value is zero then drop that row. Here is the snapshot.

    val net = sc.accumulator(0.0)
    df1.foreach(x=> {net += calculate(df2, x)})  

    def calculate(df2:DataFrame, x : Row):Double = {
     var pro:Double = 0.0

     df2.foreach(y => {if(xxx){ do some stuff and update the y.getLong(2) value }
     else if(yyy){ do some stuff and update the y.getLong(2) value}
     if(y.getLong(2) == 0) {drop this row from df2} })   
     return pro;
    }

Any suggestions? Thanks.

Upvotes: 0

Views: 701

Answers (2)

rishabh.bhardwaj
rishabh.bhardwaj

Reputation: 378

DataFrames are immutable, you can not update a value but rather create new DF every time.

Can you reframe your use case, its not very clear what you are trying to achieve with the above snippet (Not able to understand the use of accumulator) ? You can rather try df2.withColumn(...) and use your udf here.

Upvotes: 1

M.Rez
M.Rez

Reputation: 1872

You cannot change the DataFrame or RDD. They are read only for a reason. But you can create a new one and use transformations by all the means available. So when you want to change for example contents of a column in dataframe just add new column with updated contents by using functions like this:

df.withComlumn(...)

Upvotes: 1

Related Questions