Reputation: 4439
I have a df like so:
import pandas
a=[['1/2/2014', 'a', '6', 'z1'],
['1/2/2014', 'a', '3', 'z1'],
['1/3/2014', 'c', '1', 'x3'],
]
df = pandas.DataFrame.from_records(a[1:],columns=a[0])
I want to flatten the df so it is one continuous list like so:
['1/2/2014', 'a', '6', 'z1', '1/2/2014', 'a', '3', 'z1','1/3/2014', 'c', '1', 'x3']
I can loop through the rows and extend
to a list, but is a much easier way to do it?
Upvotes: 83
Views: 184930
Reputation: 345
The previously mentioned df.values.flatten().tolist()
and df.to_numpy().flatten().tolist()
are concise and effective, but I spent a very long time trying to learn how to 'do the work myself' via list comprehension and without resorting built-in functions.
For anyone else who is interested, try:
[ row for col in df for row in df[col] ]
Turns out that this solution to flattening a df
via list comprehension (which I haven't found elsewhere on SO) is just a small modification to the solution for flattening nested lists (that can be found all over SO):
[ val for sublst in lst for val in sublst ]
Upvotes: 0
Reputation: 58915
You can use .flatten()
on the DataFrame converted to a NumPy array:
df.to_numpy().flatten()
and you can also add .tolist()
if you want the result to be a Python list
.
In previous versions of Pandas, the values
attributed was used instead of the .to_numpy()
method, as mentioned in the comments below.
Upvotes: 137
Reputation: 2192
Maybe use stack?
df.stack().values
array(['1/2/2014', 'a', '3', 'z1', '1/3/2014', 'c', '1', 'x3'], dtype=object)
(Edit: Incidentally, the DF in the Q uses the first row as labels, which is why they're not in the output here.)
Upvotes: 20
Reputation: 1726
You can try with numpy
import numpy as np
np.reshape(df.values, (1,df.shape[0]*df.shape[1]))
Upvotes: 4