Reputation: 10677
Thanks to the response to my initial question, I now have a multi-indexed DataFrame the way that I want it. Now that I have the data in the data structure, I'm trying to get it out and wonder if there is a better way to do this. My two problems are related, but may have separate "ideal" solutions:
Sample DataFrame (truncated)
Experiment IWWGCW IWWGDW
Lead Time 24 48 24 48
2010-11-27 12:00:00 0.997 0.991 0.998 0.990
2010-11-28 12:00:00 0.998 0.987 0.997 0.990
2010-11-29 12:00:00 0.997 0.992 0.997 0.992
2010-11-30 12:00:00 0.997 0.987 0.997 0.987
2010-12-01 12:00:00 0.996 0.986 0.996 0.986
Iteration
I'd like to be able to loop over this DataFrame where the iteration would take me down only 1 index dimension, i.e. an iteritems
behavior that would return [('IWWGCW', df['IWWGCW']), ('IWWGDW', df['IWWGDW'])]
and yield 2 DataFrames with Lead Time columns. My brute-force solution is to use a wrapper routine that basically does [(key, df[key] for key in df.columns.levels[0]]
. Is there a better way to do this?
Apply
I'd also like to do things like "subtract the IWWGDW entries from everybody else" to compute paired differences. I tried to do df.apply(lambda f: f - df['IWWGDW'])
but get a KeyError: ('IWWGDW', 'occurred at index 2010-11-26 12:00:00')
regardless of if I use axis=1
or axis=0
. I've tried rebuilding a new DataFrame using the iteration workaround identified above, but I always worry when I brute-force things. Is there a more "pandasic" way to do this sort of computation?
Upvotes: 6
Views: 2216
Reputation: 17152
I know this is old but following @WesMcKinney answer, the best hack I found to drop inside the loop is just to select it right away such has:
for exp, group in df.groupby(level=0, axis=1):
print(group[exp])
Lead Time 24 48
2010-11-27 12:00:00 0.997 0.991
2010-11-28 12:00:00 0.998 0.987
2010-11-29 12:00:00 0.997 0.992
2010-11-30 12:00:00 0.997 0.987
2010-12-01 12:00:00 0.996 0.986
this will return a DataFrame
of the underlying level correctly
Upvotes: 0
Reputation: 105581
I would suggest using groupby for iteration:
In [25]: for exp, group in df.groupby(level=0, axis=1):
....: print exp, group
....:
IWWGCW Experiment IWWGCW
Lead Time 24 48
2010-11-27 12:00:00 0.997 0.991
2010-11-28 12:00:00 0.998 0.987
2010-11-29 12:00:00 0.997 0.992
2010-11-30 12:00:00 0.997 0.987
2010-12-01 12:00:00 0.996 0.986
IWWGDW Experiment IWWGDW
Lead Time 24 48
2010-11-27 12:00:00 0.998 0.990
2010-11-28 12:00:00 0.997 0.990
2010-11-29 12:00:00 0.997 0.992
2010-11-30 12:00:00 0.997 0.987
2010-12-01 12:00:00 0.996 0.986
However, I see that this doesn't drop the top level as you're looking for. Ideally you would be able to write something like:
df.groupby(level=0, axis=1).sub(df['IWWGCW'])
and have that do the pair-wise subtraction, but because df['IWWGCW']
drops the level, the column names don't line up. This works, though:
In [29]: df.groupby(level=0, axis=1).sub(df['IWWGCW'].values)
Out[29]:
Experiment IWWGCW IWWGDW
Lead Time 24 48 24 48
2010-11-27 12:00:00 0 0 0.001 -0.001
2010-11-28 12:00:00 0 0 -0.001 0.003
2010-11-29 12:00:00 0 0 0.000 0.000
2010-11-30 12:00:00 0 0 0.000 0.000
2010-12-01 12:00:00 0 0 0.000 0.000
I'll think a bit more about this.
Upvotes: 6