Reputation: 47
I am rather new in using pandas dataframe and have a grouping problem: i want to group a 6-column dataframe for all rows with the same values in the first 3 columns, and then i want to add a new column with the value of the last column where the value of the 4th column = 0.
So, the original dataframe looks like this:
A B C D E F G
0 11018 20190102 0 0 1546387200 37 34
1 11018 20190102 0 1 1546390800 33 36
2 11018 20190102 0 2 1546394400 19 19
3 11018 20190102 0 3 1546398000 17 26
4 11018 20190102 0 4 1546401600 16 26
5 11018 20190102 0 5 1546405200 13 23
6 11018 20190102 0 6 1546408800 11 15
7 11018 20190102 1200 0 1546430400 25 24
8 11018 20190102 1200 1 1546434000 21 3
9 11018 20190102 1200 2 1546437600 13 4
10 11018 20190102 1200 3 1546441200 7 3
11 11018 20190102 1200 4 1546444800 2 1
12 11018 20190102 1200 5 1546448400 -3 6
13 11018 20190102 1200 6 1546452000 -7 2
14 11035 20190103 0 0 1546473600 -15 -14
15 11035 20190103 0 1 1546477200 -17 -11
16 11035 20190103 0 2 1546480800 -20 -12
17 11035 20190103 0 3 1546484400 -23 -16
18 11035 20190103 0 4 1546488000 -26 -11
19 11035 20190103 0 5 1546491600 -28 -11
20 11035 20190103 0 6 1546495200 -27 -12
21 11031 20190103 1100 0 1546516800 0 1
22 11031 20190103 1100 1 1546520400 4 -7
23 11031 20190103 1100 2 1546524000 5 -6
24 11031 20190103 1100 3 1546527600 2 -16
25 11031 20190103 1100 4 1546531200 -3 -14
26 11031 20190103 1100 5 1546534800 -8 -12
27 11031 20190103 1100 6 1546538400 -12 -14
.
.
.
.
etc.
And the new dataframe should be:
A B C D E F G H
0 11018 20190102 0 0 1546387200 37 34 34
1 11018 20190102 0 1 1546390800 33 36 34
2 11018 20190102 0 2 1546394400 19 19 34
3 11018 20190102 0 3 1546398000 17 26 34
4 11018 20190102 0 4 1546401600 16 26 34
5 11018 20190102 0 5 1546405200 13 23 34
6 11018 20190102 0 6 1546408800 11 15 34
7 11018 20190102 1200 0 1546430400 25 24 24
8 11018 20190102 1200 1 1546434000 21 3 24
9 11018 20190102 1200 2 1546437600 13 4 24
10 11018 20190102 1200 3 1546441200 7 3 24
11 11018 20190102 1200 4 1546444800 2 1 24
12 11018 20190102 1200 5 1546448400 -3 6 24
13 11018 20190102 1200 6 1546452000 -7 2 24
14 11035 20190103 0 0 1546473600 -15 -14 -14
15 11035 20190103 0 1 1546477200 -17 -11 -14
16 11035 20190103 0 2 1546480800 -20 -12 -14
17 11035 20190103 0 3 1546484400 -23 -16 -14
18 11035 20190103 0 4 1546488000 -26 -11 -14
19 11035 20190103 0 5 1546491600 -28 -11 -14
20 11035 20190103 0 6 1546495200 -27 -12 -14
21 11031 20190103 1100 0 1546516800 0 1 1
22 11031 20190103 1100 1 1546520400 4 -7 1
23 11031 20190103 1100 2 1546524000 5 -6 1
24 11031 20190103 1100 3 1546527600 2 -16 1
25 11031 20190103 1100 4 1546531200 -3 -14 1
26 11031 20190103 1100 5 1546534800 -8 -12 1
27 11031 20190103 1100 6 1546538400 -12 -14 1
.
.
.
.
etc.
Is there an easy solution for this problem? Note that the rows in the original dataframe could be mixed up, too. Thanks for help!
Upvotes: 1
Views: 1138
Reputation: 11657
An alternative solution:
def col_6(df):
df['H'] = df[df['D'] == 0]['G'].values[0]
return df
df.groupby(['A','B','C']).apply(col_6)
Upvotes: 1