data_person
data_person

Reputation: 4490

Crosstab by date and hour in spark

Sample DF:

var someDF = Seq(
(1, "2017-12-02 03:04:00"),
(1, "2017-12-02 03:45:00"),
(1, "2017-12-02 04:04:00"),
(2, "2017-12-02 04:14:00"),
(2, "2017-12-02 04:54:00"),
(3, "2017-10-01 11:45:20"),
(4, "2017-10-01 02:45:20")
).toDF("number", "date")

OP:

+------+-------------------+
|number|               date|
+------+-------------------+
|     1|2017-12-02 03:04:00|
|     1|2017-12-02 03:45:00|
|     1|2017-12-02 04:04:00|
|     2|2017-12-02 04:14:00|
|     2|2017-12-02 04:54:00|
|     3|2017-10-01 11:45:20|
|     4|2017-10-01 02:45:20|
+------+-------------------+

When I try to use crosstab:

var temp = someDF.stat.crosstab("date","number")
temp.show()

OP:

+-------------------+---+---+---+---+
|        date_number|  1|  2|  3|  4|
+-------------------+---+---+---+---+
|2017-10-01 11:45:20|  0|  0|  1|  0|
|2017-12-02 03:04:00|  1|  0|  0|  0|
|2017-12-02 04:54:00|  0|  1|  0|  0|
|2017-12-02 04:14:00|  0|  1|  0|  0|
|2017-12-02 03:45:00|  1|  0|  0|  0|
|2017-12-02 04:04:00|  1|  0|  0|  0|
|2017-10-01 02:45:20|  0|  0|  0|  1|
+-------------------+---+---+---+---+

I would like to apply the same crosstab, but with date_and_hour alone, ex: 2017-12-02 03:

Expected OP:

+-------------------+---+---+---+---+
|   date_Hour_number|  1|  2|  3|  4|
+-------------------+---+---+---+---+
|2017-10-01 11      |  0|  0|  1|  0|
|2017-12-02 03 .    |  1|  0|  0|  0|
|2017-12-02 04 .    |  0|  2|  0|  0|

Any suggestion would be helpful

Upvotes: 0

Views: 916

Answers (1)

Leo C
Leo C

Reputation: 22449

Since your date column is of string type, you can simply use substring to trim the date down to hour before applying crosstab:

someDF.
  withColumn("datehour", substring($"date", 0, 13)).
  stat.crosstab("datehour", "number").
  show
// +---------------+---+---+---+---+
// |datehour_number|  1|  2|  3|  4|
// +---------------+---+---+---+---+
// |  2017-10-01 02|  0|  0|  0|  1|
// |  2017-10-01 11|  0|  0|  1|  0|
// |  2017-12-02 04|  1|  2|  0|  0|
// |  2017-12-02 03|  2|  0|  0|  0|
// +---------------+---+---+---+---+

Upvotes: 1

Related Questions