HelloToEarth
HelloToEarth

Reputation: 2117

Filling missing values in dataframe column Python

My data is broken up into 4 columns and looks like:

State       Year        Month        Value
AK          2010         1             10
AK          2010         3             20
AK          2011         1             28
AK          2011         5             29
AK          2011         12            31
.
.
TX          2010         2             10
TX          2010         3             11
TX          2010         4             20
TX          2010         12            22
TX          2011         4             30
TX          2011         7             33
.
.

I want to fill the missing Months with repetitions of the previous Values of the same Year because they are just cumulative sums that I've added together.

The months do not always begin back at Month 1 and sometimes can be missing full years so I need to address this.

Ie: TX can start at Month 4 in 2011 etc...

The desired output looks like:

State       Year        Month        Value
AK          2010         1             10
AK          2010         2             10
AK          2010         3             20
AK          2010         4             20
AK          2010         5             20
.
.
AK          2010         12            20
AK          2011         1             28
AK          2011         2             28
.
.
TX          2010         1             9
TX          2010         2             10
TX          2010         3             11
TX          2010         4             20
TX          2010         5             20
.
.
TX          2010         12            22

Upvotes: 1

Views: 413

Answers (1)

jpp
jpp

Reputation: 164613

One solution is to use Categorical Data:

# convert Month to categorical with 1-12 range
df['Month'] = pd.Categorical(df['Month'], categories=range(1, 13))

# groupby to give Cartesian product for categorical columns
df = df.groupby(['State', 'Year', 'Month']).first().reset_index()

# forward fill by group
df['Value'] = df.groupby('State')['Value'].ffill()

This solution assumes Dec-2010 data can spill over to null data for Jan-2011 for a particular state.

Upvotes: 1

Related Questions