krizz
krizz

Reputation: 125

Python pandas - append last column if values in other column matches

I have created this pandas table

index        p1           p2  p3  p4
0    29.02.2017  10.00-11.00   1   2
1    29.02.2017  10.00-11.00   1   3
2    28.02.2017  08.00-09.00  10  11
3    28.02.2017  08.00-09.00  10  12
4    29.02.2017  08.00-09.00  10  13
5    29.02.2017  08.00-09.00  10   9
6    28.02.2017  08.00-09.00  11  10
7    28.02.2017  08.00-09.00  11  12
8    28.02.2017  10.00-11.00  11  13

values are sorted by p3>p1>p2>p4

what I want to do now, is to append p4, if p1, p2, p3 matches, like e.g.

index        p1           p2  p3  p4
0    29.02.2017  10.00-11.00   1   2, 3 
2    28.02.2017  08.00-09.00  10  11, 12 
4    29.02.2017  08.00-09.00  10  13, 9
6    28.02.2017  08.00-09.00  11  10, 12
8    28.02.2017  10.00-11.00  11  13

and I dont really know what to do. I know how to append more rows, or how to sum up values, but I want to keep them as in e.g.

Upvotes: 2

Views: 497

Answers (2)

jezrael
jezrael

Reputation: 862851

You need groupby with apply and join, but need convert numeric column to string first:

df = df.groupby(['p1','p2','p3'], sort=False)['p4']
       .apply(lambda x: ', '.join(x.astype(str)))
       .reset_index()
print (df)
           p1           p2  p3      p4
0  29.02.2017  10.00-11.00   1    2, 3
1  28.02.2017  08.00-09.00  10  11, 12
2  29.02.2017  08.00-09.00  10   13, 9
3  28.02.2017  08.00-09.00  11  10, 12
4  28.02.2017  10.00-11.00  11      13

If need output as list faster is use:

print (df.groupby(['p1','p2','p3']).p4.apply(lambda x: x.tolist()))

Timings:

df = pd.concat([df]*10000).reset_index(drop=True)

In [306]: %timeit (df.groupby(['p1','p2','p3']).p4.apply(list))
10 loops, best of 3: 22.6 ms per loop

In [307]: %timeit (df.groupby(['p1','p2','p3']).p4.apply(lambda x: x.tolist()))
100 loops, best of 3: 18.4 ms per loop

Upvotes: 4

B. M.
B. M.

Reputation: 18638

If you want values as list s,

Here is a solution :

In [39]: df.groupby(['p3','p1','p2']).p4.apply(list)
Out[39]: 
p3  p1          p2         
1   29.02.2017  10.00-11.00      [2, 3]
10  28.02.2017  08.00-09.00    [11, 12]
    29.02.2017  08.00-09.00     [13, 9]
11  28.02.2017  08.00-09.00    [10, 12]
                10.00-11.00        [13]

Upvotes: 2

Related Questions