Reputation: 452

pandas: merge rows on timestamp

my data looks like this:

 date, cola, colb, colc
 1,10,,
 2,11,,
 3,12,,
 4,13,,
 1,,14,
 2,,15,
 3,,16,
 4,,17,
 1,,,17
 2,,,18
 3,,,19
 4,13,,20

I'd like to merge the rows based on the first column and have the output look like this:

 date, cola, colb, colc
 1,10,14,17
 2,11,15,18
 3,12,16,19
 4,13,17,20

I can't guarantee there won't be any conflicts, so I'd like to be able to choose the max or mean.

Upvotes: 0

Answers (1)

DSM

Reputation: 353499

You can use groupby. Start from a csv with duplicates:

>>> !cat tomerge.csv
date, cola, colb, colc
1,10,,
2,11,,
1,,14,
2,,15,
1,,24,
2,,40,
1,,,17
2,,,18

Read it in:

>>> df = pd.read_csv("tomerge.csv")
>>> df
   date   cola   colb   colc
0     1     10    NaN    NaN
1     2     11    NaN    NaN
2     1    NaN     14    NaN
3     2    NaN     15    NaN
4     1    NaN     24    NaN
5     2    NaN     40    NaN
6     1    NaN    NaN     17
7     2    NaN    NaN     18

And then the magic happens:

>>> df.groupby("date").mean()
       cola   colb   colc
date                     
1        10   19.0     17
2        11   27.5     18
>>> df.groupby("date").max()
       cola   colb   colc
date                     
1        10     24     17
2        11     40     18

Upvotes: 1

pandas: merge rows on timestamp

Answers (1)

Related Questions