yokielove
yokielove

Reputation: 213

Forward fill row with first non-missing value

I have a table and after the first number shows up, the number needs to be repeated fill in all the column after

here is a table example:

student_id    2016Q1   2016Q2   2016Q3 2016Q4  2017Q1 2017Q2 2017Q3
1                        1
2                 1
3                                 1
4                                                1
5                                                1
6                                                        1
7                                                 1

I am using pyspark and Python. I try to do it manually in excel, but that is too much manual editing. I wonder if there any function in pyspark or pyspark.sql are able to generate the following table. Thanks!

I expect the result to look like the following table

student_id    2016Q1   2016Q2   2016Q3 2016Q4  2017Q1 2017Q2 2017Q3
1                        1        1       1       1     1      1
2                 1      1        1       1       1     1      1
3                                 1       1       1     1      1
4                                                 1     1      1
5                                                 1     1      1
6                                                       1      1
7                                                 1     1      1

Upvotes: 2

Views: 241

Answers (1)

pault
pault

Reputation: 43494

Assuming that those blanks are actually nulls*, you can use pyspark.sql.functions.coalesce.

Enumerate your columns and coalesce the values of every column before and including the current column. This will return the first non-null value in that row.

from pyspark.sql.functions import coalesce

fill_cols = df.columns[1:]  # exclude the student_id

df.select(
    "student_id",
    *[coalesce(*fill_cols[:i+1]).alias(c) for i, c in enumerate(fill_cols)]
).show()
#+----------+------+------+------+------+------+------+------+
#|student_id|2016Q1|2016Q2|2016Q3|2016Q4|2017Q1|2017Q2|2017Q3|
#+----------+------+------+------+------+------+------+------+
#|         1|  null|     1|     1|     1|     1|     1|     1|
#|         2|     1|     1|     1|     1|     1|     1|     1|
#|         3|  null|  null|     1|     1|     1|     1|     1|
#|         4|  null|  null|  null|  null|     1|     1|     1|
#|         5|  null|  null|  null|  null|     1|     1|     1|
#|         6|  null|  null|  null|  null|  null|     1|     1|
#|         7|  null|  null|  null|  null|     1|     1|     1|
#+----------+------+------+------+------+------+------+------+

*If not, first replace blanks with null.

Upvotes: 1

Related Questions