Reputation: 1212
I'm trying to combine all rows of a dataframe that have the same time stamp into a single row. The df is 5k by 20.
A B ...
timestamp
11:00 NaN 10 ...
11:00 5 NaN ...
12:00 15 20 ...
... ... ...
group the 2 11:00 rows as follows
A B ...
timestamp
11:00 5 10 ...
12:00 15 20 ...
... ... ...
Any help would be appreciated. Thank you.
I have tried
df.groupby( df.index ).sum()
Upvotes: 3
Views: 3210
Reputation: 2497
groupby
after replacing the NaN
values with 0's.
df.fillna(0, inplace=True)
df.groupby(df.index).sum()
Upvotes: 2
Reputation: 2497
You could melt
('unpivot') the DataFrame to convert it from wide form to long form, remove the null values, then aggregate via groupby
.
import pandas as pd
df = pd.DataFrame({'timestamp' : ['11:00','11:00','12:00'],
'A' : [None,5,15],
'B' : [10,None,20]
})
A B timestamp
0 NaN 10 11:00
1 5 NaN 11:00
2 15 20 12:00
df2 = pd.melt(df, id_vars = 'timestamp') # specify the value_vars if needed
timestamp variable value
0 11:00 A NaN
1 11:00 A 5
2 12:00 A 15
3 11:00 B 10
4 11:00 B NaN
5 12:00 B 20
df2.dropna(inplace=True)
df3 = df2.groupby(['timestamp', 'variable']).sum()
value
timestamp variable
11:00 A 5
B 10
12:00 A 15
B 20
df3.unstack()
value
variable A B
timestamp
11:00 5 10
12:00 15 20
Upvotes: 2
Reputation: 109546
Try using resample
:
>>> df.resample('60Min', how='sum')
A B
2015-05-28 11:00:00 5 10
2015-05-28 12:00:00 15 20
More examples can be found in the Pandas Documentation.
Upvotes: 1
Reputation: 3607
You cannot sum a number and a NaN in python. You probably need to use .aggregate() :)
Upvotes: 0