Reputation: 452
my data looks like this:
date, cola, colb, colc
1,10,,
2,11,,
3,12,,
4,13,,
1,,14,
2,,15,
3,,16,
4,,17,
1,,,17
2,,,18
3,,,19
4,13,,20
I'd like to merge the rows based on the first column and have the output look like this:
date, cola, colb, colc
1,10,14,17
2,11,15,18
3,12,16,19
4,13,17,20
I can't guarantee there won't be any conflicts, so I'd like to be able to choose the max or mean.
Upvotes: 0
Views: 1198
Reputation: 353499
You can use groupby
. Start from a csv
with duplicates:
>>> !cat tomerge.csv
date, cola, colb, colc
1,10,,
2,11,,
1,,14,
2,,15,
1,,24,
2,,40,
1,,,17
2,,,18
Read it in:
>>> df = pd.read_csv("tomerge.csv")
>>> df
date cola colb colc
0 1 10 NaN NaN
1 2 11 NaN NaN
2 1 NaN 14 NaN
3 2 NaN 15 NaN
4 1 NaN 24 NaN
5 2 NaN 40 NaN
6 1 NaN NaN 17
7 2 NaN NaN 18
And then the magic happens:
>>> df.groupby("date").mean()
cola colb colc
date
1 10 19.0 17
2 11 27.5 18
>>> df.groupby("date").max()
cola colb colc
date
1 10 24 17
2 11 40 18
Upvotes: 1