Reputation: 2127
I have a dataframe that looks like this (where the first row is the index number):
0 1 2 3 4 5 6 7 8
66 56 34 40 41 55 80 None None
90 91 12 44 89 60 11 45 60
10 20 49 90 33 40 67 55 None
.
.
What I need as an output is to take every 3rd value, ignore any None values and restack it into a single row like this:
0 1 2
66 56 34
40 41 55
80 90 91
12 44 89
60 11 45
60 10 20
49 90 33
40 67 55
Is there an easy way to pivot and stack it this way?
Upvotes: 1
Views: 175
Reputation: 294278
pd.DataFrame([*zip(*[iter(df.stack())] * 3)])
0 1 2
0 66.0 56.0 34.0
1 40.0 41.0 55.0
2 80.0 90.0 91.0
3 12.0 44.0 89.0
4 60.0 11.0 45.0
5 60.0 10.0 20.0
6 49.0 90.0 33.0
7 40.0 67.0 55.0
df.stack()
turns 2D df
into 1D
series while dropping nulls[iter(df.stack())] * 3
creates an iter object (that can and will be exhausted) inside a list []
then multiplies it by three. This creates a list of length 3 in which all 3 elements point to the same iter
object. This has the consequence that when I iterate through that object while in the first position, it exhausts the object for the the rest of the positions.zip
these 3 elements together it has the effect of grouping by 3Upvotes: 1
Reputation: 12684
Rather long method but still works. 1) convert all columns into list 2) remove NaN values 3) create a list of 3 items then convert to dtaframe
import itertools
import math
lst = list(itertools.chain(*df.values))
lst = list(filter(lambda v: not math.isnan(v), lst))
df = pd.DataFrame([[lst[i], lst[i+1], lst[i+2]] for i in range(0,len(lst),3)])
df
Upvotes: 0
Reputation: 9941
First, we can convert all values to a flat array with ravel
, and then filter out None
s and reshape
:
x = df.values.ravel()
x[x != None].reshape(-1,3) # pd.DataFrame(x[x != None].reshape(-1,3))
# if you want to have a DataFrame instead of
# numpy array
Output:
array([[66., 56., 34.],
[40., 41., 55.],
[80., 90., 91.],
[12., 44., 89.],
[60., 11., 45.],
[60., 10., 20.],
[49., 90., 33.],
[40., 67., 55.]])
Upvotes: 3
Reputation: 323266
Using reshape
from numpy
s=np.concatenate(df.values)
pd.DataFrame(s[s!='None'].reshape(-1,3))
Out[1345]:
0 1 2
0 66 56 34
1 40 41 55
2 80 90 91
3 12 44 89
4 60 11 45
5 60 10 20
6 49 90 33
7 40 67 55
Upvotes: 2