gtomer
gtomer

Reputation: 6574

Filling nulls with data from other rows

I have a dataframe with many Nulls. I want to fill the Nulls with data of the same user from the same columns but on later dates. Here is the dataframe:

import pandas as pd
import numpy as np
array = {'user': ['Trevor', 'John', 'Trevor', 'John', 'Trevor', 'Trevor', 'John'], 'date': ['2020-10-11 08:00:00', '2020-10-15 08:00:00', '2020-10-17 08:00:00', '2020-10-19 08:00:00', '2020-10-10 08:00:00'
        , '2020-11-11 12:34:00', '2020-11-16 09:12:00'], 'test1': [5,np.nan,np.nan,np.nan,np.nan,8,4],
         'test2': [np.nan,8,3,np.nan,1,8,6], 'test3': [np.nan,np.nan,3,5,np.nan,8,np.nan]}
df = pd.DataFrame(array)
df.sort_values(by=['user', 'date'], ascending = True)

   user    date               test1 test2 test3
1   John    2020-10-15 08:00:00 NaN 8.0 NaN
3   John    2020-10-19 08:00:00 NaN NaN 5.0
6   John    2020-11-16 09:12:00 4.0 6.0 NaN
4   Trevor  2020-10-10 08:00:00 NaN 1.0 NaN
0   Trevor  2020-10-11 08:00:00 5.0 NaN NaN
2   Trevor  2020-10-17 08:00:00 NaN 3.0 3.0
5   Trevor  2020-11-11 12:34:00 8.0 8.0 8.0

And here is the desired output:

        user    test1   test2   test3
0       John    4        8        5
1       Trevor  5        1        3

Upvotes: 1

Views: 50

Answers (1)

sophocles
sophocles

Reputation: 13841

I don't fully comprehend how "fill the nulls with data of the same user from the same columns but on later dates" relates to your posted desired output, but you can get what you need with pivot_table:

# Added inplace=True 
df.sort_values(by=['user', 'date'], ascending = True,inplace=True)

# Pivot table using 'first'
df.pivot_table(index='user',
               aggfunc='first'). \
    drop('date',axis=1)

        test1  test2  test3
user                       
John      4.0    8.0    5.0
Trevor    5.0    1.0    3.0

Please correct me if I am not understanding you.

Upvotes: 3

Related Questions