Reputation: 887
I have this df:
ID Date Time Lat Lon
A 07/16/2019 08:00 29.39291 -98.50925
A 07/16/2019 09:00 29.39923 -98.51256
A 07/16/2019 10:00 29.40147 -98.51123
A 07/18/2019 08:30 29.38752 -98.52372
A 07/18/2019 09:30 29.39291 -98.50925
B 07/16/2019 08:00 29.39537 -98.50402
B 07/18/2019 11:00 29.39343 -98.49707
B 07/18/2019 12:00 29.39291 -98.50925
B 07/19/2019 10:00 29.39556 -98.53148
I want to group the df by ID
and Date
, shift the rows one step backward, and fill the NaN values with forward filling.
Note: (ID, Date)
with only one row, should be filled by the row itself.
For example: B 07/16/2019 08:00 29.39537 -98.50402
Expected result:
ID Date Time Lat Lon Time.1 Lat.1 Lon.1
A 07/16/2019 08:00 29.39291 -98.50925 09:00 29.39923 -98.51256
A 07/16/2019 09:00 29.39923 -98.51256 10:00 29.40147 -98.51123
A 07/16/2019 10:00 29.40147 -98.51123 10:00 29.40147 -98.51123
A 07/18/2019 08:30 29.38752 -98.52372 09:30 29.39291 -98.50925
A 07/18/2019 09:30 29.39291 -98.50925 09:30 29.39291 -98.50925
B 07/16/2019 08:00 29.39537 -98.50402 08:00 29.39537 -98.50402
B 07/18/2019 11:00 29.39343 -98.49707 12:00 29.39291 -98.50925
B 07/18/2019 12:00 29.39291 -98.50925 12:00 29.39291 -98.50925
B 07/19/2019 10:00 29.39556 -98.53148 10:00 29.39556 -98.53148
The code I'm using (which doesn't meet the expected result):
pd.concat([df, df.groupby(['ID','Date']).shift(-1).ffill()], axis=1)
Upvotes: 2
Views: 1130
Reputation: 862841
Solution if no missing values in original data - first replace rows with one element groups by original values and then forward filling missing values:
m = ~df.duplicated(['ID','Date']) & ~df.duplicated(['ID','Date'], keep=False)
df1 = df.groupby(['ID','Date']).shift(-1).mask(m, df).ffill()
df = pd.concat([df, df1.add_suffix('.1')], axis=1)
print (df)
ID Date Time Lat Lon Time.1 Lat.1 Lon.1
0 A 07/16/2019 08:00 29.39291 -98.50925 09:00 29.39923 -98.51256
1 A 07/16/2019 09:00 29.39923 -98.51256 10:00 29.40147 -98.51123
2 A 07/16/2019 10:00 29.40147 -98.51123 10:00 29.40147 -98.51123
3 A 07/18/2019 08:30 29.38752 -98.52372 09:30 29.39291 -98.50925
4 A 07/18/2019 09:30 29.39291 -98.50925 09:30 29.39291 -98.50925
5 B 07/16/2019 08:00 29.39537 -98.50402 08:00 29.39537 -98.50402
6 B 07/18/2019 11:00 29.39343 -98.49707 12:00 29.39291 -98.50925
7 B 07/18/2019 12:00 29.39291 -98.50925 12:00 29.39291 -98.50925
8 B 07/19/2019 10:00 29.39556 -98.53148 10:00 29.39556 -98.53148
If not necessary custom function then double groupby
are necessary, because forward filling is necessary per groups:
df1 = df.groupby(['ID','Date']).shift(-1).groupby([df['ID'],df['Date']]).ffill().fillna(df)
df = pd.concat([df, df1.add_suffix('.1')], axis=1)
print (df)
ID Date Time Lat Lon Time.1 Lat.1 Lon.1
0 A 07/16/2019 08:00 29.39291 -98.50925 09:00 29.39923 -98.51256
1 A 07/16/2019 09:00 29.39923 -98.51256 10:00 29.40147 -98.51123
2 A 07/16/2019 10:00 29.40147 -98.51123 10:00 29.40147 -98.51123
3 A 07/18/2019 08:30 29.38752 -98.52372 09:30 29.39291 -98.50925
4 A 07/18/2019 09:30 29.39291 -98.50925 09:30 29.39291 -98.50925
5 B 07/16/2019 08:00 29.39537 -98.50402 08:00 29.39537 -98.50402
6 B 07/18/2019 11:00 29.39343 -98.49707 12:00 29.39291 -98.50925
7 B 07/18/2019 12:00 29.39291 -98.50925 12:00 29.39291 -98.50925
8 B 07/19/2019 10:00 29.39556 -98.53148 10:00 29.39556 -98.53148
With lambda function should be solution like:
c = ['Time','Lat','Lon']
df1 = df.groupby(['ID','Date'])[c].apply(lambda x: x.shift(-1).ffill()).fillna(df)
df = pd.concat([df, df1.add_suffix('.1')], axis=1)
print (df)
ID Date Time Lat Lon Time.1 Lat.1 Lon.1
0 A 07/16/2019 08:00 29.39291 -98.50925 09:00 29.39923 -98.51256
1 A 07/16/2019 09:00 29.39923 -98.51256 10:00 29.40147 -98.51123
2 A 07/16/2019 10:00 29.40147 -98.51123 10:00 29.40147 -98.51123
3 A 07/18/2019 08:30 29.38752 -98.52372 09:30 29.39291 -98.50925
4 A 07/18/2019 09:30 29.39291 -98.50925 09:30 29.39291 -98.50925
5 B 07/16/2019 08:00 29.39537 -98.50402 08:00 29.39537 -98.50402
6 B 07/18/2019 11:00 29.39343 -98.49707 12:00 29.39291 -98.50925
7 B 07/18/2019 12:00 29.39291 -98.50925 12:00 29.39291 -98.50925
8 B 07/19/2019 10:00 29.39556 -98.53148 10:00 29.39556 -98.53148
Upvotes: 2
Reputation: 21719
Here's a way to do:
def grp_col(f):
f['Time.1'] = f['Time'].shift(-1).ffill().fillna(f['Time'].iloc[0])
f['Lat.1'] = f['Lat'].shift(-1).ffill().fillna(f['Lat'].iloc[0])
f['Lon.1'] = f['Lon'].shift(-1).ffill().fillna(f['Lon'].iloc[0])
return f
df = df.groupby(['ID','Date'], as_index=False).apply(grp_col)
Upvotes: 1