Outcast
Outcast

Reputation: 5117

Fill in values of dataframes per groups of rows

Let's suppose that I have the following dataset:

Time    Geography           Sex     Population
1990    Northern Ireland    Male    NA
1990    Northern Ireland    Female  NA
1990    Northern Ireland    Total   NA
1991    Northern Ireland    Male    NA
1991    Northern Ireland    Female  NA
1991    Northern Ireland    Total   NA
1992    Northern Ireland    Male    792100
1992    Northern Ireland    Female  831100
1992    Northern Ireland    Total   1623300
1993    Northern Ireland    Male    812100
1993    Northern Ireland    Female  851100
1993    Northern Ireland    Total   1663200

and finally I want to have the following:

Time    Geography           Sex     Population
1990    Northern Ireland    Male    792100
1990    Northern Ireland    Female  831100
1990    Northern Ireland    Total   1623300
1991    Northern Ireland    Male    792100
1991    Northern Ireland    Female  831100
1991    Northern Ireland    Total   1623300
1992    Northern Ireland    Male    792100
1992    Northern Ireland    Female  831100
1992    Northern Ireland    Total   1623300
1993    Northern Ireland    Male    812100
1993    Northern Ireland    Female  851100
1993    Northern Ireland    Total   1663200

meaning that basically I want to fill in the values of the previous years with the values of the first year without NAs.

How do I do this?

Upvotes: 2

Views: 66

Answers (3)

BENY
BENY

Reputation: 323266

I will using groupby and bfill and ffill(I am adding ffill and bfill just for protection)

df['Population']=df.groupby(['Geography','Sex']).Population.apply(lambda x : x.ffill().bfill())
df
   Time        Geography     Sex  Population
0  1990  NorthernIreland    Male    792100.0
1  1990  NorthernIreland  Female    831100.0
2  1990  NorthernIreland   Total   1623300.0
3  1991  NorthernIreland    Male    792100.0
4  1991  NorthernIreland  Female    831100.0
5  1991  NorthernIreland   Total   1623300.0
6  1992  NorthernIreland    Male    792100.0
7  1992  NorthernIreland  Female    831100.0
8  1992  NorthernIreland   Total   1623300.0

Upvotes: 1

Erfan
Erfan

Reputation: 42916

You can chain pandas.DataFrame.sort_values, pandas.DataFrame.fillna with method bfill and after that pandas.DataFrame.sort_index to get your original index back in order:

df = df.sort_values(['Sex']).fillna(method='bfill').sort_index()

print(df)
   Time         Geography     Sex  Population
0  1990  Northern Ireland    Male    792100.0
1  1990  Northern Ireland  Female    831100.0
2  1990  Northern Ireland   Total   1623300.0
3  1991  Northern Ireland    Male    792100.0
4  1991  Northern Ireland  Female    831100.0
5  1991  Northern Ireland   Total   1623300.0
6  1992  Northern Ireland    Male    792100.0
7  1992  Northern Ireland  Female    831100.0
8  1992  Northern Ireland   Total   1623300.0

Upvotes: 3

Scott Boston
Scott Boston

Reputation: 153460

You can try this:

df.set_index(['Time','Geography','Sex']).unstack().bfill().stack().reset_index()

Output:

   Time         Geography     Sex  Population
0  1990  Northern Ireland  Female    831100.0
1  1990  Northern Ireland    Male    792100.0
2  1990  Northern Ireland   Total   1623300.0
3  1991  Northern Ireland  Female    831100.0
4  1991  Northern Ireland    Male    792100.0
5  1991  Northern Ireland   Total   1623300.0
6  1992  Northern Ireland  Female    831100.0
7  1992  Northern Ireland    Male    792100.0
8  1992  Northern Ireland   Total   1623300.0

Upvotes: 3

Related Questions