Reputation: 568
I need to fill rows by default value =1
in DataFrame based on information in Start
and Finish
columns.
So, limit for filling separated columns in row is defined based on ['Start', 'Finish']
.
DataFrame, df1
is:
ID Car Jan17 Jun18 Dec18 Apr19 Start Finish
0 Nissan 0.0 1.7 3.7 0.0 Jun18 Dec18
1 Porsche 10.0 0.0 2.8 3.5 Jan17 Apr19
2 Golf 0.0 1.7 3.0 2.0 Jun18 Apr19
3 Toyota 1.0 0.0 3.0 5.2 Jan17 Apr19
4 Mazda 0.0 0.0 3.0 4.2 Dec18 Apr19
5 Mercedes 0.0 0.0 0.0 7.2 Apr19 Apr19
6 Passat 0.0 3.0 0.0 0.0 Jun18 Jun18
For instance, if there is a row #0:
Start = Jun18
and Finish = Dec18
.
Values in row#0 should be filled by 1
for columns, started from Jun18
till to Dec18
.
I tried to use numpy.sign()
function, but there is wrong result in case, if 0.0
is between of two non-zero values.
Expected result is df2
:
ID Car Jan17 Jun18 Dec18 Apr19 Start Finish
0 Nissan 0.0 1.0 1.0 0.0 Jun18 Dec18
1 Porsche 1.0 1.0 1.0 1.0 Jan17 Apr19
2 Golf 0.0 1.0 1.0 1.0 Jun18 Apr19
3 Toyota 1.0 1.0 1.0 1.0 Jan17 Apr19
4 Mazda 0.0 0.0 1.0 1.0 Dec18 Apr19
5 Mercedes 0.0 0.0 0.0 1.0 Apr19 Apr19
6 Passat 0.0 1.0 0.0 0.0 Jun18 Jun18
Upvotes: 3
Views: 80
Reputation: 59579
get_dummies
+ interpolate
This requires your columns to be sorted in time order, and for Start and Finish to ideally always exist in the column names.
df = df.set_index(['ID', 'Car', 'Start', 'Finish'])
s1 = (pd.get_dummies(df.index.get_level_values('Start'))
.reindex(df.columns, axis=1)
.replace(0, np.NaN))
s2 = (pd.get_dummies(df.index.get_level_values('Finish'))
.reindex(df.columns, axis=1)
.replace(0, np.NaN))
res = s1.combine_first(s2).interpolate(axis=1, limit_area='inside').fillna(0, downcast='infer')
res.index = df.index
res = res.reset_index()
res
: ID Car Start Finish Jan17 Jun18 Dec18 Apr19
0 0 Nissan Jun18 Dec18 0 1 1 0
1 1 Porsche Jan17 Apr19 1 1 1 1
2 2 Golf Jun18 Apr19 0 1 1 1
3 3 Toyota Jan17 Apr19 1 1 1 1
4 4 Mazda Dec18 Apr19 0 0 1 1
5 5 Mercedes Apr19 Apr19 0 0 0 1
6 6 Passat Jun18 Jun18 0 1 0 0
In the case where Start
and Finish
were already derived from the data itself (seems to be the first and last non-zero columns), you can skip all of the dummies and use where
instead on the original DataFrame.
df = df.set_index(['ID', 'Car', 'Start', 'Finish'])
res = (df.where(df.ne(0))
.clip(1,1)
.interpolate(axis=1, limit_area='inside')
.fillna(0, downcast='infer')
.reset_index())
Upvotes: 3