Reputation: 24500
I have one basic df - a monthly report of loans , it's 1700 contract ids of loans given per some date(monthly) e.g. report per 01.01.2019:
contract_id loan_sum expiry_days_01_01_2019
1 1000 20
2 2300 3
3 500 17
4 1400 6
5 890 5
another for 01.02.2019
contract_id loan_sum expiry_days_01_02_2019
1 7000 20
2 3000 39
3 500 130
4 9400 89
5 909 7
etc.
I have 5 reports with same number of ids. I have combined them under another.
Now i have many nan values for expiry-date column, like in a matrix with diagonal filled properly with ones.
id contract_id loan_sum expiry_days_01_01_2019 expiry_01_02_2019
0 1 7000 20 nan
1 2 3000 39 nan
2 3 500 130 nan
3 4 9400 89 nan
4 5 909 7 nan
5 1 7000 nan 20
6 2 3000 nan 39
7 3 500 nan 130
8 4 9400 nan 89
9 5 909 nan 7
... ... ...
When the contract id repeats it should pull the same date, but instead they become NaNs.
How to fill nan values based on same , repeating contract id?
Upvotes: 1
Views: 195
Reputation: 29635
you could use groupby.transform
with first. if the altogether dataframe is called dfm
, then:
# I assumed you put the dataframes together like this
dfm = pd.concat([df1, df2], axis=0)
col_exp = dfm.filter(like='expiry').columns
dfm[col_exp] = dfm.groupby('contract_id')[col_exp].transform('first')
print (dfm)
contract_id loan_sum expiry_days_01_01_2019 expiry_days_01_02_2019
0 1 1000 20.0 20.0
1 2 2300 3.0 39.0
2 3 500 17.0 130.0
3 4 1400 6.0 89.0
4 5 890 5.0 7.0
0 1 7000 20.0 20.0
1 2 3000 3.0 39.0
2 3 500 17.0 130.0
3 4 9400 6.0 89.0
4 5 909 5.0 7.0
But I think changing the way the altogether dataframe is build could be another solution and create multiindex columns, something like:
list_dfs = [df1, df2]
dfm = pd.concat([df_.set_index('contract_id')
.rename(columns=lambda x: x.split('_')[0])
for df_ in list_dfs],
keys=[df_.filter(like='expiry').columns[0][-10:]
for df_ in list_dfs],
axis=1)
print (dfm)
01_01_2019 01_02_2019
loan expiry loan expiry
contract_id
1 1000 20 7000 20
2 2300 3 3000 39
3 500 17 500 130
4 1400 6 9400 89
5 890 5 909 7
Upvotes: 2