Reputation: 16240
Suppose I have a DataFrame
that looks like:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Week' : [1, 2, 1, 2, 1, 2, 1, 2],
'Rabbits' : np.random.randn(8),
'Donkeys' : np.random.randn(8) * 4,
'Mice' : np.random.randn(8) * 4})
Which makes df
:
Then I want to group based on the days and perform a basic corr
test on each day:
week_group = df.groupby('Week')
week_group = week_group[df.columns.difference(["Week"])]
week_cor = week_group.corr()
Which makes week_cor
a MultiIndex
with Week 1, and Week 2:
So, now I want to do the following: I want to create a DataFrame
based off the "two" DataFrame
s. To elaborate: Let's treat Week 1 as df1
, and Week 2 as df2
. Now let's consider an entry in df1
entry1
and an entry in df2
, entry2
. The resulting DataFrame
is constructed as follows:
def collapse(entry1, entry2):
if abs(entry1) >= 0.6 and abs(entry2) >= 0.6:
return 1
else:
return 0
So in this case I would want something like:
Donkeys Mice Rabbits
Donkeys 1.000000 0.000000 0.000000
Mice 0.000000 1.000000 0.000000
Rabbits 0.000000 0.000000 1.000000
In python I would normally perform a reduce
a nested list, but it doesn't work:
from functools import reduce
def collapse(entry1, entry2):
if abs(entry1) >= 0.6 and abs(entry2) >= 0.6:
return 1
else:
return 0
reduce(collapse, week_cor)
Which gives:
TypeError: bad operand type for abs(): 'str'
Which makes sense, since it is kind of an array with string keys.
I could be misunderstanding the purpose of pandas
, but I feel like this idea of performing a reduce
like operation along a MultiIndex
would be somewhat common and that pandas
would have a way to do this. Please correct me if I am wrong about this assumption, and if not, what is the standard way of reducing along a MultiIndex
?
In general: I am taking a single DataFrame
and grouping the data by some time point. Then I am performing an operation (in this example corr
) to get a MultiIndex
based off of time. I want to "collapse" or reduce the MultiIndex
in a way similar to reduce
ing a list in Python. As a result I am reducing the MultiIndex
to a DataFrame
.
Upvotes: 1
Views: 596
Reputation: 59579
In this case, I think you can just do another groupby
on the first level of week_cor
, checking if all abs values are greater than or equal to 0.6
print(week_cor)
Donkeys Mice Rabbits
Week
1 Donkeys 1.000000 -0.118953 -0.235307
Mice -0.118953 1.000000 0.803987
Rabbits -0.235307 0.803987 1.000000
2 Donkeys 1.000000 0.229929 -0.593603
Mice 0.229929 1.000000 -0.645369
Rabbits -0.593603 -0.645369 1.000000
week_cor.groupby(level=1).apply(lambda x: x.abs().ge(0.6).all())
Donkeys Mice Rabbits
Donkeys True False False
Mice False True True
Rabbits False True True
Upvotes: 3
Reputation: 16240
Note: I posted this answer before I saw the comment by Ben.T, his way is more concise and probably should be used.
I am extending Dascienz answer to make it more general:
As Dascienz said:
So I think the simplest solution for what you want is to drop the MultiIndex using
pandas.DataFrame.reset_index
Thus, from:
animal_group = week_cor.reset_index()
We get:
This can be then grouped again by "level_1"
, so to illustrate (a slice of what this looks like):
animal_group = week_cor.reset_index().groupby("level_1")
animal_group.get_group("Donkeys")
gives:
This can be reduced using agg
(although, I'm not sure if this is the best) and the "Week"
column can just be dropped in the end:
from math import floor
def collapse(x):
x = x.map(lambda elem: 1 if abs(elem) > 0.6 else 0)
# A little bit of a math trick here...
return floor(x.abs().sum() / 2)
animal_group.agg(collapse).drop("Week", axis=1)
Still seems a little bit verbose (or maybe I am expecting too much from Python). But in the end:
As desired.
Upvotes: 0
Reputation: 1071
So I think the simplest solution for what you want is to drop the MultiIndex using pandas.DataFrame.reset_index
like so:
week_cor = week_cor.reset_index()
Now you can select the correlation subset you like by the Week
column. In this way, you can perform further operations on the two of them more easily. Here's a numpy
solution, that you might be able to build off of.
cols = ['Donkeys','Mice','Rabbits']
df1 = week_cor[week_cor['Week'] == 1][cols].values #ndarray
df2 = week_cor[week_cor['Week'] == 2][cols].values #ndarray
def collapse(A, B):
return np.where((A >= 0.6) & (B >= 0.6), 1, 0)
new_df = pd.DataFrame(collapse(df1, df2), index=cols, columns=cols)
Let me know if you get reduce
to work, because I'd be interested in knowing.
Upvotes: 1