soulwreckedyouth
soulwreckedyouth

Reputation: 585

Imputation of missing values and division of those

imagine a dataset like the following:

df = pd.DataFrame({'Contacts 6M':[4,7,20,5,6,0,1,19], 'Contacts 3M':[2,3,9,np.nan,np.nan,0,np.nan,9]})

enter image description here

As you can imagine: Column 'Contacts 6M' is the counted number of contacts in the last 6 month where the other column holds the information of the number of contacts in the last 3 month. So 'Contacts 3M' includes parts of the information of the other column.

I impute the missing values with the method forward fill:

df.ffill(axis = 1, inplace=True)

My question: How do I divide the imputed value by 2 and round the imputed values (please no floats) while iterating over the dataset?

Upvotes: 0

Views: 137

Answers (2)

Sandeep Kothari
Sandeep Kothari

Reputation: 415

It can be easily done by this way:

df.iloc[df[df['Contacts 3M'].isna()].index,1]=df[df['Contacts 3M'].isna()]['Contacts 6M']/2

df['Contacts 3M']=df['Contacts 3M'].astype('int')

Upvotes: 1

sai
sai

Reputation: 1784

You could keep track of the indices where you had np.nan and later use it do any arithmetic you want to-

import pandas as pd
import numpy as np

df = pd.DataFrame({'Contacts 6M': [4, 7, 20, 5, 6, 0, 1, 19], 'Contacts 3M': [2, 3, 9, np.nan, np.nan, 0, np.nan, 9]}, dtype=np.int)
mask = df['Contacts 3M'].isna()

df = df.ffill(axis=1)  # for some weird reason, inplace=True was throwing 'NotImplementedError'
df['Contacts 3M'][mask] //= 2

print(df)
Output
   Contacts 6M  Contacts 3M
0            4            2
1            7            3
2           20            9
3            5            2
4            6            3
5            0            0
6            1            0
7           19            9

Upvotes: 1

Related Questions