Reputation: 335
I have a dataframe that looks as follows:
id cyear month datadate fyear
1 1988 3 nan nan
1 1988 4 nan nan
1 1988 5 1988-05-31 1988
1 1988 6 nan nan
1 1988 7 nan nan
1 1988 8 nan nan
1 1988 9 nan nan
1 1988 12 nan nan
1 1989 1 nan nan
1 1989 2 nan nan
1 1989 3 nan nan
1 1989 4 nan nan
1 1989 5 1989-05-31 1989
1 1989 6 nan nan
1 1989 7 nan nan
1 1989 8 nan nan
1 1990 8 nan nan
4 2000 1 nan nan
4 2000 2 nan nan
4 2000 3 nan nan
4 2000 4 nan nan
4 2000 5 nan nan
4 2000 6 nan nan
4 2000 7 nan nan
4 2000 8 nan nan
4 2000 9 nan nan
4 2000 10 nan nan
4 2000 11 nan nan
4 2000 12 2000-12-31 2000
5 2000 11 nan nan
More specifically, I have a dataframe consisting of monthly (month) data on firms (id) per calendar year (cyear). If the respective row, i.e. month, represents the end of a fiscal year of the firm, the datadate column will denote the respective months end as a date variable and the fyear column will denote the respective fiscal year that just ended.
I now want the fyear value to indicate the respective fiscal year not just in the last month of the respective companies fiscal year, but in every month within the respective fiscal year:
id cyear month datadate fyear
1 1988 3 nan 1988
1 1988 4 nan 1988
1 1988 5 1988-05-31 1988
1 1988 6 nan 1989
1 1988 7 nan 1989
1 1988 8 nan 1989
1 1988 9 nan 1989
1 1988 12 nan 1989
1 1989 1 nan 1989
1 1989 2 nan 1989
1 1989 3 nan 1989
1 1989 4 nan 1989
1 1989 5 1989-05-31 1989
1 1989 6 nan 1990
1 1989 7 nan 1990
1 1989 8 nan 1990
1 1990 8 nan 1991
4 2000 1 nan 2000
4 2000 2 nan 2000
4 2000 3 nan 2000
4 2000 4 nan 2000
4 2000 5 nan 2000
4 2000 6 nan 2000
4 2000 7 nan 2000
4 2000 8 nan 2000
4 2000 9 nan 2000
4 2000 10 nan 2000
4 2000 11 nan 2000
4 2000 12 2000-12-31 2000
5 2000 11 nan nan
Note that months may be missing, as evident in case of id 1, and fiscal years may end on different months in fyear=cyear or fyear=cyear+1 (I have included only the former example, one could construct the latter example by adding 1 to the current fyear values of e.g. id 1). Also, the last row(s) of a given firm may not necessarily be its fiscal year end month, as evident in case of id 1. Lastly, there may exist firms for which no information on fiscal years is available.
I appreciate any help on this.
Upvotes: 1
Views: 93
Reputation: 14949
Do you want this?
def backword_fill(x):
x = x.bfill()
x = x.ffill() + x.isna().astype(int)
return x
df.fyear = df.groupby('id')['fyear'].transform(backword_fill)
Output
id cyear month datadate fyear
0 1 1988 3 <NA> 1988
1 1 1988 4 <NA> 1988
2 1 1988 5 1988-05-31 1988
3 1 1988 6 <NA> 1989
4 1 1988 7 <NA> 1989
5 1 1988 8 <NA> 1989
6 1 1988 9 <NA> 1989
7 1 1988 12 <NA> 1989
8 1 1989 1 <NA> 1989
9 1 1989 2 <NA> 1989
10 1 1989 3 <NA> 1989
11 1 1989 4 <NA> 1989
12 1 1989 5 1989-05-31 1989
13 1 1989 6 <NA> 1990
14 4 2000 1 <NA> 2000
15 4 2000 2 <NA> 2000
16 4 2000 3 <NA> 2000
17 4 2000 4 <NA> 2000
18 4 2000 5 <NA> 2000
19 4 2000 6 <NA> 2000
20 4 2000 7 <NA> 2000
21 4 2000 8 <NA> 2000
22 4 2000 9 <NA> 2000
23 4 2000 10 <NA> 2000
24 4 2000 11 <NA> 2000
25 4 2000 12 2000-12-31 2000
Upvotes: 2