shenflow
shenflow

Reputation: 335

Fill up columns in dataframe based on condition

I have a dataframe that looks as follows:

id  cyear  month     datadate  fyear
1    1988      3          nan    nan
1    1988      4          nan    nan
1    1988      5   1988-05-31   1988
1    1988      6          nan    nan
1    1988      7          nan    nan
1    1988      8          nan    nan
1    1988      9          nan    nan
1    1988     12          nan    nan
1    1989      1          nan    nan
1    1989      2          nan    nan
1    1989      3          nan    nan
1    1989      4          nan    nan 
1    1989      5   1989-05-31   1989
1    1989      6          nan    nan
1    1989      7          nan    nan
1    1989      8          nan    nan
1    1990      8          nan    nan
4    2000      1          nan    nan 
4    2000      2          nan    nan
4    2000      3          nan    nan 
4    2000      4          nan    nan 
4    2000      5          nan    nan
4    2000      6          nan    nan 
4    2000      7          nan    nan 
4    2000      8          nan    nan
4    2000      9          nan    nan
4    2000     10          nan    nan
4    2000     11          nan    nan
4    2000     12   2000-12-31   2000
5    2000     11          nan    nan     

More specifically, I have a dataframe consisting of monthly (month) data on firms (id) per calendar year (cyear). If the respective row, i.e. month, represents the end of a fiscal year of the firm, the datadate column will denote the respective months end as a date variable and the fyear column will denote the respective fiscal year that just ended.

I now want the fyear value to indicate the respective fiscal year not just in the last month of the respective companies fiscal year, but in every month within the respective fiscal year:

id  cyear  month     datadate  fyear
1    1988      3          nan   1988
1    1988      4          nan   1988
1    1988      5   1988-05-31   1988
1    1988      6          nan   1989
1    1988      7          nan   1989
1    1988      8          nan   1989
1    1988      9          nan   1989
1    1988     12          nan   1989
1    1989      1          nan   1989
1    1989      2          nan   1989
1    1989      3          nan   1989
1    1989      4          nan   1989 
1    1989      5   1989-05-31   1989
1    1989      6          nan   1990
1    1989      7          nan   1990
1    1989      8          nan   1990
1    1990      8          nan   1991
4    2000      1          nan   2000 
4    2000      2          nan   2000
4    2000      3          nan   2000
4    2000      4          nan   2000 
4    2000      5          nan   2000
4    2000      6          nan   2000 
4    2000      7          nan   2000 
4    2000      8          nan   2000
4    2000      9          nan   2000
4    2000     10          nan   2000
4    2000     11          nan   2000
4    2000     12   2000-12-31   2000
5    2000     11          nan    nan   

Note that months may be missing, as evident in case of id 1, and fiscal years may end on different months in fyear=cyear or fyear=cyear+1 (I have included only the former example, one could construct the latter example by adding 1 to the current fyear values of e.g. id 1). Also, the last row(s) of a given firm may not necessarily be its fiscal year end month, as evident in case of id 1. Lastly, there may exist firms for which no information on fiscal years is available.

I appreciate any help on this.

Upvotes: 1

Views: 93

Answers (1)

Nk03
Nk03

Reputation: 14949

Do you want this?

def backword_fill(x):
    x = x.bfill()
    x = x.ffill() + x.isna().astype(int)
    return x
    
df.fyear = df.groupby('id')['fyear'].transform(backword_fill)

Output

    id  cyear  month    datadate  fyear
0    1   1988      3        <NA>   1988
1    1   1988      4        <NA>   1988
2    1   1988      5  1988-05-31   1988
3    1   1988      6        <NA>   1989
4    1   1988      7        <NA>   1989
5    1   1988      8        <NA>   1989
6    1   1988      9        <NA>   1989
7    1   1988     12        <NA>   1989
8    1   1989      1        <NA>   1989
9    1   1989      2        <NA>   1989
10   1   1989      3        <NA>   1989
11   1   1989      4        <NA>   1989
12   1   1989      5  1989-05-31   1989
13   1   1989      6        <NA>   1990
14   4   2000      1        <NA>   2000
15   4   2000      2        <NA>   2000
16   4   2000      3        <NA>   2000
17   4   2000      4        <NA>   2000
18   4   2000      5        <NA>   2000
19   4   2000      6        <NA>   2000
20   4   2000      7        <NA>   2000
21   4   2000      8        <NA>   2000
22   4   2000      9        <NA>   2000
23   4   2000     10        <NA>   2000
24   4   2000     11        <NA>   2000
25   4   2000     12  2000-12-31   2000

Upvotes: 2

Related Questions