josepmaria
josepmaria

Reputation: 571

Fill in rows with former row values, based on conditons, groupby and certain rows only

I have this dataset

df = pd.DataFrame({'user': {0: 848, 1: 848, 2: 848, 3: 848, 4: 848, 5: 848, 6: 848, 7: 848, 8: 848, 9: 848, 10: 848, 11: 848, 12: 848, 13: 848}, \
                   'date': {0: '2005-02-05', 1: '2006-10-25', 2: '2006-11-07', 3: '2006-11-20', 4: '2006-12-04', 5: '2006-12-21', 6: '2007-01-08', 7: '2007-02-08', 8: '2007-03-08', 9: '2007-04-10', 10: '2007-11-28', 11: '2007-12-10', 12: '2009-01-07', 13: '2009-01-12'},\
                   'need_data': {0: 0, 1: 0, 2: 0, 3: 1, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 1, 11: 0, 12: 1, 13: 0}, \
                   'vt': {0: 34.0, 1: 49.25, 2: 49.25, 3: 0.0, 4: 49.4, 5: 0.0, 6: 0.0, 7: 49.8, 8: 0.0, 9: 50.1, 10: 0.0, 11: 0.0, 12: 0.0, 13: 0.0}, \
                   }) 

I need new column ['feed1'], with condition as follows: if need_data column equals to 1, and thus column [vt] have value 0, then I need to grab as value for ['feed1'] column the one (within same user column) from former entry with values (different than 0) column[vt].

Desired output is as follows:

df = pd.DataFrame( {'user': {0: 848, 1: 848, 2: 848, 3: 848, 4: 848, 5: 848, 6: 848, 7: 848, 8: 848, 9: 848, 10: 848, 11: 848, 12: 848, 13: 848}, 'date': {0: '2005-02-05', 1: '2006-10-25', 2: '2006-11-07', 3: '2006-11-20', 4: '2006-12-04', 5: '2006-12-21', 6: '2007-01-08', 7: '2007-02-08', 8: '2007-03-08', 9: '2007-04-10', 10: '2007-11-28', 11: '2007-12-10', 12: '2009-01-07', 13: '2009-01-12'}, 'need_data': {0: 0, 1: 0, 2: 0, 3: 1, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 1, 11: 0, 12: 1, 13: 0}, 'vt': {0: 34.0, 1: 49.25, 2: 49.25, 3: 0.0, 4: 49.4, 5: 0.0, 6: 0.0, 7: 49.8, 8: 0.0, 9: 50.1, 10: 0.0, 11: 0.0, 12: 0.0, 13: 0.0}, 'feed2': {0: '2005-02-05', 1: '2006-10-25', 2: '2006-11-07', 3: '2006-11-07', 4: '2006-12-04', 5: '2006-12-21', 6: '2007-01-08', 7: '2007-02-08', 8: '2007-03-08', 9: '2007-04-10', 10: '2007-04-10', 11: '2007-12-10', 12: '2007-04-10', 13: '2009-01-12'}} )

Here is an ilustration to show desired output: enter image description here

Below there is a couple of similar questions, but not exactly.

question1, question2

Upvotes: 1

Views: 134

Answers (1)

jezrael
jezrael

Reputation: 862581

Replace date by NaN if need_data == 1 and then use GroupBy.ffill:

df['feed2'] = df['date'].mask(df['need_data'].eq(1)).groupby(df['user']).ffill()
print (df)

    user        date vt c1 c2 c3  need_data       feed1       feed2
0      1  1995-09-01  2  1  3  5          0  1995-09-01  1995-09-01
1      1  1995-09-02  0  0  0  0          1  1995-09-01  1995-09-01
2      1  1995-10-03  0  0  0  0          1  1995-09-01  1995-09-01
3      2  1995-10-04  6  2  2  5          0  1995-10-04  1995-10-04
4      2  1995-10-05  1  3  5  6          0  1995-10-05  1995-10-05
5      2  1995-11-07  1  9  3  4          0  1995-11-07  1995-11-07
6      2  1995-11-08  0  0  0  0          1  1995-11-07  1995-11-07
7      3  1995-11-09  0  2  4  4          0  1995-11-09  1995-11-09
8      3  1995-11-10  0  0  0  0          1  1995-11-09  1995-11-09
9      3  1995-11-15  0  5  6  6          0  1995-12-15  1995-11-15
10     3  1995-12-18  0  5  2  3          0  1995-12-18  1995-12-18
11     4  1995-12-19  0  6  7  4          0  1995-12-19  1995-12-19
12     4  1995-12-20  0  4  0  3          0  1995-12-20  1995-12-20
13     4  1995-12-23  0  0  0  0          1  1995-12-20  1995-12-20
14     4  1995-12-26  0  6  8  2          0  1995-12-26  1995-12-26
15     4  1995-12-27  0  0  0  0          1  1995-12-26  1995-12-26

Solution without column new_data, here are tested string '0' if exist in all columns per rows:

m = df[['vt','c1','c2','c3']].eq('0').all(axis=1)
df['feed2'] = df['date'].mask(m).groupby(df['user']).ffill()

EDIT: You need test both conditions chained by | for bitwise OR and then use Series.where for set original values if not equal m1:

m1 = df['need_data'].eq(1)
m2 = df['vt'].eq(0)
df['feed2'] = df['date'].mask(m1 | m2).groupby(df['user']).ffill().where(m1, df['date'])
print (df)
    user        date  need_data     vt       feed2
0    848  2005-02-05          0  34.00  2005-02-05
1    848  2006-10-25          0  49.25  2006-10-25
2    848  2006-11-07          0  49.25  2006-11-07
3    848  2006-11-20          1   0.00  2006-11-07
4    848  2006-12-04          0  49.40  2006-12-04
5    848  2006-12-21          0   0.00  2006-12-21
6    848  2007-01-08          0   0.00  2007-01-08
7    848  2007-02-08          0  49.80  2007-02-08
8    848  2007-03-08          0   0.00  2007-03-08
9    848  2007-04-10          0  50.10  2007-04-10
10   848  2007-11-28          1   0.00  2007-04-10
11   848  2007-12-10          0   0.00  2007-12-10
12   848  2009-01-07          1   0.00  2007-04-10
13   848  2009-01-12          0   0.00  2009-01-12

Upvotes: 1

Related Questions