HelloToEarth
HelloToEarth

Reputation: 2127

Restacking n-columns into 3 columns Python

I have a dataframe that looks like this (where the first row is the index number):

0      1      2      3       4       5       6        7       8

66     56     34     40     41      55       80      None     None
90     91     12     44     89      60       11      45       60
10     20     49     90     33      40       67      55       None
.
.

What I need as an output is to take every 3rd value, ignore any None values and restack it into a single row like this:

0       1         2 

66      56        34
40      41        55
80      90        91
12      44        89
60      11        45
60      10        20
49      90        33
40      67        55

Is there an easy way to pivot and stack it this way?

Upvotes: 1

Views: 175

Answers (4)

piRSquared
piRSquared

Reputation: 294278

Python Fairy Dust

pd.DataFrame([*zip(*[iter(df.stack())] * 3)])

      0     1     2
0  66.0  56.0  34.0
1  40.0  41.0  55.0
2  80.0  90.0  91.0
3  12.0  44.0  89.0
4  60.0  11.0  45.0
5  60.0  10.0  20.0
6  49.0  90.0  33.0
7  40.0  67.0  55.0

What the ... was that?

  • df.stack() turns 2D df into 1D series while dropping nulls
  • [iter(df.stack())] * 3 creates an iter object (that can and will be exhausted) inside a list [] then multiplies it by three. This creates a list of length 3 in which all 3 elements point to the same iter object. This has the consequence that when I iterate through that object while in the first position, it exhausts the object for the the rest of the positions.
  • When I zip these 3 elements together it has the effect of grouping by 3
  • What's left is to drop this into a dataframe constructor

Upvotes: 1

jose_bacoy
jose_bacoy

Reputation: 12684

Rather long method but still works. 1) convert all columns into list 2) remove NaN values 3) create a list of 3 items then convert to dtaframe

import itertools
import math 

lst = list(itertools.chain(*df.values))
lst = list(filter(lambda v: not math.isnan(v), lst))
df = pd.DataFrame([[lst[i], lst[i+1], lst[i+2]] for i in range(0,len(lst),3)])
df

Upvotes: 0

perl
perl

Reputation: 9941

First, we can convert all values to a flat array with ravel, and then filter out Nones and reshape:

x = df.values.ravel()
x[x != None].reshape(-1,3)  # pd.DataFrame(x[x != None].reshape(-1,3))
                            # if you want to have a DataFrame instead of
                            # numpy array

Output:

array([[66., 56., 34.],
       [40., 41., 55.],
       [80., 90., 91.],
       [12., 44., 89.],
       [60., 11., 45.],
       [60., 10., 20.],
       [49., 90., 33.],
       [40., 67., 55.]])

Upvotes: 3

BENY
BENY

Reputation: 323266

Using reshape from numpy

s=np.concatenate(df.values)
pd.DataFrame(s[s!='None'].reshape(-1,3))
Out[1345]: 
    0   1   2
0  66  56  34
1  40  41  55
2  80  90  91
3  12  44  89
4  60  11  45
5  60  10  20
6  49  90  33
7  40  67  55

Upvotes: 2

Related Questions