Dair
Dair

Reputation: 16240

Is there a function to reduce a MultiIndex?

Suppose I have a DataFrame that looks like:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Week' : [1, 2, 1, 2, 1, 2, 1, 2],
                           'Rabbits' : np.random.randn(8),
                           'Donkeys' : np.random.randn(8) * 4,
                           'Mice'   :  np.random.randn(8) * 4})

Which makes df:

Example df

Then I want to group based on the days and perform a basic corr test on each day:

week_group = df.groupby('Week')
week_group = week_group[df.columns.difference(["Week"])]
week_cor = week_group.corr()

Which makes week_cor a MultiIndex with Week 1, and Week 2:

Donkey MultiIndex

So, now I want to do the following: I want to create a DataFrame based off the "two" DataFrames. To elaborate: Let's treat Week 1 as df1, and Week 2 as df2. Now let's consider an entry in df1 entry1 and an entry in df2, entry2. The resulting DataFrame is constructed as follows:

def collapse(entry1, entry2):
    if abs(entry1) >= 0.6 and abs(entry2) >= 0.6:
        return 1
    else:
        return 0

So in this case I would want something like:

         Donkeys   Mice      Rabbits                              
Donkeys  1.000000  0.000000  0.000000
Mice     0.000000  1.000000  0.000000
Rabbits  0.000000  0.000000  1.000000

In python I would normally perform a reduce a nested list, but it doesn't work:

from functools import reduce

def collapse(entry1, entry2):
    if abs(entry1) >= 0.6 and abs(entry2) >= 0.6:
        return 1
    else:
        return 0

reduce(collapse, week_cor)

Which gives:

TypeError: bad operand type for abs(): 'str'

Which makes sense, since it is kind of an array with string keys.

I could be misunderstanding the purpose of pandas, but I feel like this idea of performing a reduce like operation along a MultiIndex would be somewhat common and that pandas would have a way to do this. Please correct me if I am wrong about this assumption, and if not, what is the standard way of reducing along a MultiIndex?

In general: I am taking a single DataFrame and grouping the data by some time point. Then I am performing an operation (in this example corr) to get a MultiIndex based off of time. I want to "collapse" or reduce the MultiIndex in a way similar to reduceing a list in Python. As a result I am reducing the MultiIndex to a DataFrame.

Upvotes: 1

Views: 596

Answers (3)

ALollz
ALollz

Reputation: 59579

In this case, I think you can just do another groupby on the first level of week_cor, checking if all abs values are greater than or equal to 0.6

print(week_cor)

               Donkeys      Mice   Rabbits
Week                                      
1    Donkeys  1.000000 -0.118953 -0.235307
     Mice    -0.118953  1.000000  0.803987
     Rabbits -0.235307  0.803987  1.000000
2    Donkeys  1.000000  0.229929 -0.593603
     Mice     0.229929  1.000000 -0.645369
     Rabbits -0.593603 -0.645369  1.000000

Code:

week_cor.groupby(level=1).apply(lambda x: x.abs().ge(0.6).all())  

         Donkeys   Mice  Rabbits
Donkeys     True  False    False
Mice       False   True     True
Rabbits    False   True     True

Upvotes: 3

Dair
Dair

Reputation: 16240

Note: I posted this answer before I saw the comment by Ben.T, his way is more concise and probably should be used.

I am extending Dascienz answer to make it more general:

As Dascienz said:

So I think the simplest solution for what you want is to drop the MultiIndex using pandas.DataFrame.reset_index

Thus, from:

animal_group = week_cor.reset_index()

We get:

Reset Index

This can be then grouped again by "level_1", so to illustrate (a slice of what this looks like):

animal_group = week_cor.reset_index().groupby("level_1")
animal_group.get_group("Donkeys")

gives:

Donkey Slice

This can be reduced using agg (although, I'm not sure if this is the best) and the "Week" column can just be dropped in the end:

from math import floor

def collapse(x):
    x = x.map(lambda elem: 1 if abs(elem) > 0.6 else 0)
    # A little bit of a math trick here...
    return floor(x.abs().sum() / 2)

animal_group.agg(collapse).drop("Week", axis=1)

Still seems a little bit verbose (or maybe I am expecting too much from Python). But in the end:

Animal Time Cor

As desired.

Upvotes: 0

Dascienz
Dascienz

Reputation: 1071

So I think the simplest solution for what you want is to drop the MultiIndex using pandas.DataFrame.reset_index like so:

week_cor = week_cor.reset_index() 

Now you can select the correlation subset you like by the Week column. In this way, you can perform further operations on the two of them more easily. Here's a numpy solution, that you might be able to build off of.

cols = ['Donkeys','Mice','Rabbits']
df1 = week_cor[week_cor['Week'] == 1][cols].values #ndarray
df2 = week_cor[week_cor['Week'] == 2][cols].values #ndarray

def collapse(A, B):
    return np.where((A >= 0.6) & (B >= 0.6), 1, 0)

new_df = pd.DataFrame(collapse(df1, df2), index=cols, columns=cols)

Let me know if you get reduce to work, because I'd be interested in knowing.

Upvotes: 1

Related Questions