Reputation: 13
I need your help on a pandas problem :
I am currently extracting data via APIs that contain gaps in their ranks.
However I need to take into account these on the dataset by replacing them with an average value.
Then I need to insert a row in my dataframe to fill the dataframe.
Illustration :
Here's what my problem looks like :
rank timestamp value
0 1 21:50 3450
1 4 21:40 3442
2 5 21:41 5964
3 6 14:27 5258
4 7 13:10 3001
5 8 14:02 2782
ranks 2 and 3 are missing
So,hHere's what I'm trying to get :
rank timestamp value
0 1 21:50 3450
1 2 NaN avg
2 3 NaN avg
3 4 21:40 3442
4 5 21:41 5964
5 6 14:27 5258
6 7 13:10 3001
7 8 14:02 2782
I know approximately how to deal with columns, but I have no idea how to deal with rows.
Do you have an idea ?
I have already tried to use "append" but I struggle then to reindex my dataframe :/
Upvotes: 1
Views: 70
Reputation: 38415
You can use reindex to add missing ranks and fillna to fill missing values.
df = df.set_index('rank').reindex(np.arange(df['rank'].min(), df['rank'].max()+1)).reset_index()
df['value'] = df['value'].fillna(df['value'].mean()).round()
rank timestamp value
0 1 21:50 3450
1 2 NaN 3982
2 3 NaN 3982
3 4 21:40 3442
4 5 21:41 5964
5 6 14:27 5258
6 7 13:10 3001
7 8 14:02 2782
Upvotes: 2