Reputation: 6263
One of the biggest problems I have experienced in Python's Pandas is the continual defaulting to pandas.core.series.Series type. e.g.
import numpy as np
import pandas as pd
a = pd.DataFrame( np.random.randn(5,5),columns=list('ABCDE') )
b = a.mean(axis=0)
>>> b
A 0.399677
B 0.080594
C 0.060423
D -1.206630
E 0.153359
dtype: float64
>>> type(b)
<class 'pandas.core.series.Series'>
So, if I try to insert into a new data frame I get all sorts of errors (i.e. dimension mismatch, etc). It seems to me that when I perform an operation on a data frame the output should be a data frame, not a Series. Does anyone have a recommendation on how to use, e.g. df.mean(), and have a data frame returned?
BEGIN EDIT
Sorry, I should have given more detail.
I want to selectively average slices of my original data frame, and insert these averaged values into a separate data frame.
# This is how I've been trying to do it
# Using <a> from above
b = pd.DataFrame()
# Select out data from original data frame
tmp = a(a.A>5).mean() # Just an example, this is not really my selection criteria
# Now I want to store these averaged values in my aggregated data frame.
b = pd.concat( [b,tmp] )
I guess my real question is: How can I average data in one data frame and pass it into another for storage? END EDIT
EDIT Take 2 I have two data sets (both stored as data frames), both of which are time series. Both time series have irregular time stamps: one has a time stamp every ~90s (between hours of 0700 - 2000), the other has one or two time stamps per day (satellite overpass data)). None of the time stamps are regular (i.e. they rarely occur at the same time, and they are very rarely centered on the hour, or half hour, etc.). My goal is to take my high-frequency data and average it centered on the satellite's time stamp (+/- 30min) then store the averaged data in a new data frame. Here is the actual code I have written so far:
# OMI is the satellite data, ~daily resolution
# Pan is surface data, with 90s resolution
# Example data:
>>> pan.head()
hcho h2o so2 o3 no2
2010-06-24 14:01:20 0.87784 2.9947 NaN NaN 0.671104
2010-06-24 14:03:52 0.68877 3.0102 NaN NaN 0.684615
2010-06-24 14:04:35 NaN NaN 0.58119 285.76 NaN
2010-06-24 14:05:19 0.75813 3.0218 NaN NaN 0.693880
2010-06-24 14:06:02 NaN NaN 0.40973 286.00 NaN
>>> omi.head()
ctp dist no2 no2std cf
2010-06-24 17:51:43 7 23.8 5.179200e+15 1.034600e+15 0.001
2010-06-26 17:39:34 3 7.0 7.355800e+15 1.158100e+15 0.113
2010-07-01 17:57:40 9 8.4 5.348300e+15 9.286100e+14 0.040
2010-07-03 17:45:30 5 32.2 5.285300e+15 8.877800e+14 0.000
# Code
out = pd.DataFrame()
width = 30 # Defined earlier, input of function
for r in omi.index:
# Define datetime limits
d1 = r - dt.timedelta(minutes=width)
d2 = r + dt.timedelta(minutes=width)
tmp = pan.truncate(d1,d2).mean(axis=0,skipna=True)
if tmp.nunique()<>0: # Ensuring there is something in <tmp>
tmp = pd.DataFrame(tmp,index=[r],columns=pan.columns)
out = pd.concat([out,tmp],axis=0,ignore_index=False)
Upvotes: 1
Views: 2587
Reputation: 394459
You can just construct a DataFrame from the series easily like so:
c = DataFrame(a.mean(axis=0), columns=['mean'])
c
Out[91]:
mean
A -0.210582
B -0.742551
C 0.347408
D 0.276034
E 0.399468
Still I don't see what this really achieves for you that is better than the original returned Series?
Upvotes: 2