Lechucico
Lechucico

Reputation: 2102

Modify spark DataFrame column

I would like to change the following dataframe:

--id--rating--timestamp--
-------------------------
| 0 | 5.0  |  231312231 |
| 1 | 3.0  |  192312311 | #Epoch time (seconds from 1 Thursday, 1 January 1970)
-------------------------

to the following dataframe:

--id--rating--timestamp--
--------------------------
| 0 |  5.0  |  05        |
| 1 |  3.0  |  04        | #Month of year
--------------------------

How I can do that?

Upvotes: 6

Views: 8979

Answers (2)

Haroun Mohammedi
Haroun Mohammedi

Reputation: 2424

If you coming from scala, you can use sql.functions methods inside Dataframe.select or Dataframe.withClumn methods, for your case I think the method month(e: Column): Column can perform the change you want. It will be something like that :

import org.apache.spark.sql.functions.month
df.withColumn("timestamp", month("timestamp") as "month")

I do believe that there's an equivalent way in Java, Python and R

Upvotes: 1

T. Gawęda
T. Gawęda

Reputation: 16076

It's easy using built-in functions

import org.apache.spark.sql.functions._;
import spark.implicits._
val newDF = dataset.withColumn("timestamp", month(from_unixtime('timestamp)));

Note that DataFrames are immutable, so you can create new DataFrame but not modify. Of course you can assign this Dataset to the same variable.

Note number 2: DataFrame = Dataset[Row], that's why I use both names

Upvotes: 6

Related Questions