Reputation: 521
I have a dataframe of the following structure which is simplified for this question.
A B C D E
0 2014/01/01 nan nan 0.2 nan
1 2014/01/01 0.1 nan nan nan
2 2014/01/01 nan 0.3 nan 0.7
3 2014/01/02 nan 0.4 nan nan
4 2014/01/02 0.5 nan 0.6 0.8
What I have here is a series of readings across several timestamps on single days. The columns B,C,D and E represent different locations. The data I am reading in is set up such that at a specified timestamp it takes data from certain locations and fills in nan values for the other locations.
What I wish to do is group the data by timestamp which I can easily do with a .GroupBy()
command. From there I wish to have the nan values in the grouped data be overwritten with the valid values taken in later rows such that this is the following result is obtained.
A B C D E
0 2014/01/01 0.1 0.3 0.2 0.7
1 2014/01/02 0.5 0.4 0.6 0.8
How do I go about achieving this?
Upvotes: 1
Views: 456
Reputation: 323226
you can try this by using pandas
first
df.groupby('A', as_index=False).first()
A B C D E
0 1/1/2014 0.1 0.3 0.2 0.7
1 1/2/2014 0.5 0.4 0.6 0.8
Upvotes: 2
Reputation: 402283
Try df.groupby
with DataFrameGroupBy.agg
:
In [528]: df.groupby('A', as_index=False, sort=False).agg(np.nansum)
Out[528]:
A B C D E
0 2014/01/01 0.1 0.3 0.2 0.7
1 2014/01/02 0.5 0.4 0.6 0.8
A shorter version with DataFrameGroupBy.sum
(thanks MaxU!):
In [537]: df.groupby('A', as_index=False, sort=False).sum()
Out[537]:
A B C D E
0 2014/01/01 0.1 0.3 0.2 0.7
1 2014/01/02 0.5 0.4 0.6 0.8
Upvotes: 7