Reputation: 6505
This is a recursive question here on Stackoverflow, yet the solution given here is still not perfect. Yielding is still (for me) one of the most complex things to use in python, so I dont know how to fix it myself.
When an item within any of the lists given to the function is a Pandas dataframe, the flatten function will return its header, instead of the dataframe itself. You can expressly test this by running the following code:
import pandas
import collections
df = pandas.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))
def flatten(l):
for el in l:
if isinstance(el, collections.Iterable) and not isinstance(el, basestring):
for sub in flatten(el):
yield sub
else:
yield el
Then, if you call the function given on the referenced post:
list(flatten([df])) #['A', 'B', 'C', 'D']
Instead of returning a list with the dataframe inside. How to make the function flatten respect the dataframes?
Upvotes: 1
Views: 860
Reputation: 353139
That flatten
function will recurse down if the element is an instance of collections.Iterable
and it's not a string (which is iterable, but we usually want to treat it as a scalar, something we're not going to look inside).
Even though DataFrames
are instances of collections.Iterable
, it sounds like you want them to be terminal too. In that case:
if (isinstance(el, collections.Iterable) and
not isinstance(el, (basestring, pandas.DataFrame))):
After which:
>>> list(flatten([[1,2], "2", df]))
[1, 2, '2', <class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 0 to 99
Data columns (total 4 columns):
A 100 non-null values
B 100 non-null values
C 100 non-null values
D 100 non-null values
Upvotes: 3