Reputation: 2117
My data is broken up into 4 columns and looks like:
State Year Month Value
AK 2010 1 10
AK 2010 3 20
AK 2011 1 28
AK 2011 5 29
AK 2011 12 31
.
.
TX 2010 2 10
TX 2010 3 11
TX 2010 4 20
TX 2010 12 22
TX 2011 4 30
TX 2011 7 33
.
.
I want to fill the missing Months with repetitions of the previous Values of the same Year because they are just cumulative sums that I've added together.
The months do not always begin back at Month 1 and sometimes can be missing full years so I need to address this.
Ie: TX can start at Month 4 in 2011 etc...
The desired output looks like:
State Year Month Value
AK 2010 1 10
AK 2010 2 10
AK 2010 3 20
AK 2010 4 20
AK 2010 5 20
.
.
AK 2010 12 20
AK 2011 1 28
AK 2011 2 28
.
.
TX 2010 1 9
TX 2010 2 10
TX 2010 3 11
TX 2010 4 20
TX 2010 5 20
.
.
TX 2010 12 22
Upvotes: 1
Views: 413
Reputation: 164613
One solution is to use Categorical Data:
# convert Month to categorical with 1-12 range
df['Month'] = pd.Categorical(df['Month'], categories=range(1, 13))
# groupby to give Cartesian product for categorical columns
df = df.groupby(['State', 'Year', 'Month']).first().reset_index()
# forward fill by group
df['Value'] = df.groupby('State')['Value'].ffill()
This solution assumes Dec-2010 data can spill over to null data for Jan-2011 for a particular state.
Upvotes: 1