Flatten (an irregular) list of lists in Python respecting Pandas Dataframes

Question

This is a recursive question here on Stackoverflow, yet the solution given here is still not perfect. Yielding is still (for me) one of the most complex things to use in python, so I dont know how to fix it myself.

When an item within any of the lists given to the function is a Pandas dataframe, the flatten function will return its header, instead of the dataframe itself. You can expressly test this by running the following code:

import pandas
import collections
df = pandas.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, basestring):
            for sub in flatten(el):
                yield sub
        else:
            yield el

Then, if you call the function given on the referenced post:

list(flatten([df]))   #['A', 'B', 'C', 'D']

Instead of returning a list with the dataframe inside. How to make the function flatten respect the dataframes?

DSM · Accepted Answer

That flatten function will recurse down if the element is an instance of collections.Iterable and it's not a string (which is iterable, but we usually want to treat it as a scalar, something we're not going to look inside).

Even though DataFrames are instances of collections.Iterable, it sounds like you want them to be terminal too. In that case:

    if (isinstance(el, collections.Iterable) and 
        not isinstance(el, (basestring, pandas.DataFrame))):

After which:

>>> list(flatten([[1,2], "2", df]))
[1, 2, '2', 
Int64Index: 100 entries, 0 to 99
Data columns (total 4 columns):
A    100  non-null values
B    100  non-null values
C    100  non-null values
D    100  non-null values

Flatten (an irregular) list of lists in Python respecting Pandas Dataframes

Answers (1)

Related Questions