Reputation: 571
I have this dataset
df = pd.DataFrame({'user': {0: 848, 1: 848, 2: 848, 3: 848, 4: 848, 5: 848, 6: 848, 7: 848, 8: 848, 9: 848, 10: 848, 11: 848, 12: 848, 13: 848}, \
'date': {0: '2005-02-05', 1: '2006-10-25', 2: '2006-11-07', 3: '2006-11-20', 4: '2006-12-04', 5: '2006-12-21', 6: '2007-01-08', 7: '2007-02-08', 8: '2007-03-08', 9: '2007-04-10', 10: '2007-11-28', 11: '2007-12-10', 12: '2009-01-07', 13: '2009-01-12'},\
'need_data': {0: 0, 1: 0, 2: 0, 3: 1, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 1, 11: 0, 12: 1, 13: 0}, \
'vt': {0: 34.0, 1: 49.25, 2: 49.25, 3: 0.0, 4: 49.4, 5: 0.0, 6: 0.0, 7: 49.8, 8: 0.0, 9: 50.1, 10: 0.0, 11: 0.0, 12: 0.0, 13: 0.0}, \
})
I need new column ['feed1'], with condition as follows: if need_data column equals to 1, and thus column [vt] have value 0, then I need to grab as value for ['feed1'] column the one (within same user column) from former entry with values (different than 0) column[vt].
Desired output is as follows:
df = pd.DataFrame( {'user': {0: 848, 1: 848, 2: 848, 3: 848, 4: 848, 5: 848, 6: 848, 7: 848, 8: 848, 9: 848, 10: 848, 11: 848, 12: 848, 13: 848}, 'date': {0: '2005-02-05', 1: '2006-10-25', 2: '2006-11-07', 3: '2006-11-20', 4: '2006-12-04', 5: '2006-12-21', 6: '2007-01-08', 7: '2007-02-08', 8: '2007-03-08', 9: '2007-04-10', 10: '2007-11-28', 11: '2007-12-10', 12: '2009-01-07', 13: '2009-01-12'}, 'need_data': {0: 0, 1: 0, 2: 0, 3: 1, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 1, 11: 0, 12: 1, 13: 0}, 'vt': {0: 34.0, 1: 49.25, 2: 49.25, 3: 0.0, 4: 49.4, 5: 0.0, 6: 0.0, 7: 49.8, 8: 0.0, 9: 50.1, 10: 0.0, 11: 0.0, 12: 0.0, 13: 0.0}, 'feed2': {0: '2005-02-05', 1: '2006-10-25', 2: '2006-11-07', 3: '2006-11-07', 4: '2006-12-04', 5: '2006-12-21', 6: '2007-01-08', 7: '2007-02-08', 8: '2007-03-08', 9: '2007-04-10', 10: '2007-04-10', 11: '2007-12-10', 12: '2007-04-10', 13: '2009-01-12'}} )
Here is an ilustration to show desired output:
Below there is a couple of similar questions, but not exactly.
Upvotes: 1
Views: 134
Reputation: 862581
Replace date
by NaN
if need_data == 1
and then use GroupBy.ffill
:
df['feed2'] = df['date'].mask(df['need_data'].eq(1)).groupby(df['user']).ffill()
print (df)
user date vt c1 c2 c3 need_data feed1 feed2
0 1 1995-09-01 2 1 3 5 0 1995-09-01 1995-09-01
1 1 1995-09-02 0 0 0 0 1 1995-09-01 1995-09-01
2 1 1995-10-03 0 0 0 0 1 1995-09-01 1995-09-01
3 2 1995-10-04 6 2 2 5 0 1995-10-04 1995-10-04
4 2 1995-10-05 1 3 5 6 0 1995-10-05 1995-10-05
5 2 1995-11-07 1 9 3 4 0 1995-11-07 1995-11-07
6 2 1995-11-08 0 0 0 0 1 1995-11-07 1995-11-07
7 3 1995-11-09 0 2 4 4 0 1995-11-09 1995-11-09
8 3 1995-11-10 0 0 0 0 1 1995-11-09 1995-11-09
9 3 1995-11-15 0 5 6 6 0 1995-12-15 1995-11-15
10 3 1995-12-18 0 5 2 3 0 1995-12-18 1995-12-18
11 4 1995-12-19 0 6 7 4 0 1995-12-19 1995-12-19
12 4 1995-12-20 0 4 0 3 0 1995-12-20 1995-12-20
13 4 1995-12-23 0 0 0 0 1 1995-12-20 1995-12-20
14 4 1995-12-26 0 6 8 2 0 1995-12-26 1995-12-26
15 4 1995-12-27 0 0 0 0 1 1995-12-26 1995-12-26
Solution without column new_data
, here are tested string '0'
if exist in all columns per rows:
m = df[['vt','c1','c2','c3']].eq('0').all(axis=1)
df['feed2'] = df['date'].mask(m).groupby(df['user']).ffill()
EDIT: You need test both conditions chained by |
for bitwise OR
and then use Series.where
for set original values if not equal m1
:
m1 = df['need_data'].eq(1)
m2 = df['vt'].eq(0)
df['feed2'] = df['date'].mask(m1 | m2).groupby(df['user']).ffill().where(m1, df['date'])
print (df)
user date need_data vt feed2
0 848 2005-02-05 0 34.00 2005-02-05
1 848 2006-10-25 0 49.25 2006-10-25
2 848 2006-11-07 0 49.25 2006-11-07
3 848 2006-11-20 1 0.00 2006-11-07
4 848 2006-12-04 0 49.40 2006-12-04
5 848 2006-12-21 0 0.00 2006-12-21
6 848 2007-01-08 0 0.00 2007-01-08
7 848 2007-02-08 0 49.80 2007-02-08
8 848 2007-03-08 0 0.00 2007-03-08
9 848 2007-04-10 0 50.10 2007-04-10
10 848 2007-11-28 1 0.00 2007-04-10
11 848 2007-12-10 0 0.00 2007-12-10
12 848 2009-01-07 1 0.00 2007-04-10
13 848 2009-01-12 0 0.00 2009-01-12
Upvotes: 1