Reputation: 2102
I would like to change the following dataframe:
--id--rating--timestamp--
-------------------------
| 0 | 5.0 | 231312231 |
| 1 | 3.0 | 192312311 | #Epoch time (seconds from 1 Thursday, 1 January 1970)
-------------------------
to the following dataframe:
--id--rating--timestamp--
--------------------------
| 0 | 5.0 | 05 |
| 1 | 3.0 | 04 | #Month of year
--------------------------
How I can do that?
Upvotes: 6
Views: 8979
Reputation: 2424
If you coming from scala, you can use sql.functions
methods inside Dataframe.select
or Dataframe.withClumn
methods, for your case I think the method month(e: Column): Column
can perform the change you want. It will be something like that :
import org.apache.spark.sql.functions.month
df.withColumn("timestamp", month("timestamp") as "month")
I do believe that there's an equivalent way in Java
, Python
and R
Upvotes: 1
Reputation: 16076
It's easy using built-in functions
import org.apache.spark.sql.functions._;
import spark.implicits._
val newDF = dataset.withColumn("timestamp", month(from_unixtime('timestamp)));
Note that DataFrames are immutable, so you can create new DataFrame but not modify. Of course you can assign this Dataset to the same variable.
Note number 2: DataFrame = Dataset[Row], that's why I use both names
Upvotes: 6