Reputation: 45
I have a data frame that looks like this:
+--------------+-------+-------+-------+-------+-------+-------+-------+
|Country/Region| 3/7/20| 3/8/20| 3/9/20|3/10/20|3/11/20|3/12/20|3/13/20|
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Senegal| 0| 4| 10| 18| 27| 31| 35|
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Tunisia| 1| 8| 15| 21| 37| 42| 59|
+--------------+-------+-------+-------+-------+-------+-------+-------+
For each country, I have a unique row, but I have many columns representing days. I would like to go through each column and subtract from it the corresponding value in the previous column such as the resulting df should be as follows:
+--------------+-------+-------+-------+-------+-------+-------+-------+
|Country/Region| 3/7/20| 3/8/20| 3/9/20|3/10/20|3/11/20|3/12/20|3/13/20|
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Senegal| 0| 4| 6| 8| 9| 4| 4|
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Tunisia| 1| 7| 7| 6| 16| 5| 17|
+--------------+-------+-------+-------+-------+-------+-------+-------+
Upvotes: 1
Views: 538
Reputation: 6338
Perhaps this is helpful-
df2.show(false)
df2.printSchema()
/**
* +--------------+------+------+------+-------+-------+-------+-------+
* |Country/Region|3/7/20|3/8/20|3/9/20|3/10/20|3/11/20|3/12/20|3/13/20|
* +--------------+------+------+------+-------+-------+-------+-------+
* |Senegal |0 |4 |10 |18 |27 |31 |35 |
* |Tunisia |1 |8 |15 |21 |37 |42 |59 |
* +--------------+------+------+------+-------+-------+-------+-------+
*
* root
* |-- Country/Region: string (nullable = true)
* |-- 3/7/20: integer (nullable = true)
* |-- 3/8/20: integer (nullable = true)
* |-- 3/9/20: integer (nullable = true)
* |-- 3/10/20: integer (nullable = true)
* |-- 3/11/20: integer (nullable = true)
* |-- 3/12/20: integer (nullable = true)
* |-- 3/13/20: integer (nullable = true)
*/
val new_df = df2.withColumn("01/01/70", lit(0))
val tuples = new_df.schema.filter(_.dataType.isInstanceOf[NumericType])
.map(_.name)
.map(c => {
val sdf = new SimpleDateFormat("MM/dd/yy")
(sdf.parse(c), c)
}).sortBy(_._1)
.map(_._2)
.sliding(2, 1)
.map(seq => (col(seq.last) - col(seq.head)).as(seq.last))
new_df.select(col("Country/Region") +: tuples.toSeq: _* )
.show(false)
/**
* +--------------+------+------+------+-------+-------+-------+-------+
* |Country/Region|3/7/20|3/8/20|3/9/20|3/10/20|3/11/20|3/12/20|3/13/20|
* +--------------+------+------+------+-------+-------+-------+-------+
* |Senegal |0 |4 |6 |8 |9 |4 |4 |
* |Tunisia |1 |7 |7 |6 |16 |5 |17 |
* +--------------+------+------+------+-------+-------+-------+-------+
*/
Upvotes: 1