Reputation: 6574
I have a dataframe with many Nulls. I want to fill the Nulls with data of the same user from the same columns but on later dates. Here is the dataframe:
import pandas as pd
import numpy as np
array = {'user': ['Trevor', 'John', 'Trevor', 'John', 'Trevor', 'Trevor', 'John'], 'date': ['2020-10-11 08:00:00', '2020-10-15 08:00:00', '2020-10-17 08:00:00', '2020-10-19 08:00:00', '2020-10-10 08:00:00'
, '2020-11-11 12:34:00', '2020-11-16 09:12:00'], 'test1': [5,np.nan,np.nan,np.nan,np.nan,8,4],
'test2': [np.nan,8,3,np.nan,1,8,6], 'test3': [np.nan,np.nan,3,5,np.nan,8,np.nan]}
df = pd.DataFrame(array)
df.sort_values(by=['user', 'date'], ascending = True)
user date test1 test2 test3
1 John 2020-10-15 08:00:00 NaN 8.0 NaN
3 John 2020-10-19 08:00:00 NaN NaN 5.0
6 John 2020-11-16 09:12:00 4.0 6.0 NaN
4 Trevor 2020-10-10 08:00:00 NaN 1.0 NaN
0 Trevor 2020-10-11 08:00:00 5.0 NaN NaN
2 Trevor 2020-10-17 08:00:00 NaN 3.0 3.0
5 Trevor 2020-11-11 12:34:00 8.0 8.0 8.0
And here is the desired output:
user test1 test2 test3
0 John 4 8 5
1 Trevor 5 1 3
Upvotes: 1
Views: 50
Reputation: 13841
I don't fully comprehend how "fill the nulls with data of the same user from the same columns but on later dates" relates to your posted desired output, but you can get what you need with pivot_table
:
# Added inplace=True
df.sort_values(by=['user', 'date'], ascending = True,inplace=True)
# Pivot table using 'first'
df.pivot_table(index='user',
aggfunc='first'). \
drop('date',axis=1)
test1 test2 test3
user
John 4.0 8.0 5.0
Trevor 5.0 1.0 3.0
Please correct me if I am not understanding you.
Upvotes: 3