Wiliam
Wiliam

Reputation: 1088

Stacking data frames on top of each other

Suppose I have the following dataframes:

f1 = pd.DataFrame({
    'feature1': ['a', 'b','c'],
    'col2': [1,2,3],
    'col3': [0.1,0.2,0.3]
})

f2 = pd.DataFrame({
    'feature2': ['x', 'y','z'],
    'col2': [4,5,6],
    'col3': [0.4,0.5,0.6]
})

f3 = pd.DataFrame({
    'feature2': ['i', 'j','k'],
    'col2': [7,8,9],
    'col3': [0.7,0.8,0.9]
})

I want to make a new dataframe to stack the three dataframes on top of each other such that I get:

  Feature   Col1  Col2
0   feature1  col2  col3
1          a     1   0.1
2          b     2   0.2
3          c     3   0.3
4   feature2  col2  col3
5          x     4   0.4
6          y     5   0.5
7          z     6   0.6
8   feature2  col2  col3
9          i     7   0.7
10         j     8   0.8
11         k     9   0.9

So far I have been doing this by exporting each dataframe into excel and then copy and paste them into a new sheet manually (hence when I import the final excel file into Python I can get the desired result). But I am sure there should be a way to do this efficiently within Python itself?

Upvotes: 0

Views: 788

Answers (2)

Sayandip Dutta
Sayandip Dutta

Reputation: 15872

You can use a function, using pandas.DataFrame.T, pandas.DataFrame.reset_index and pandas.DataFrame.set_axis to preprocess the data, and then pandas.concat:

def preprocess(df, new_cols=['Feature', 'Col1', 'Col2']):
    """
    Make columns the first row of the dataframe.
    And replace column names with `new_cols`.
    """
    return df.T.reset_index().T.set_axis(new_cols, axis='columns')

>>> pd.concat(map(preprocess, [f1, f2, f3]), ignore_index=True)

     Feature  Col1  Col2
0   feature1  col2  col3
1          a     1   0.1
2          b     2   0.2
3          c     3   0.3
4   feature2  col2  col3
5          x     4   0.4
6          y     5   0.5
7          z     6   0.6
8   feature2  col2  col3
9          i     7   0.7
10         j     8   0.8
11         k     9   0.9

Here, preprocess(f1) gives:

>>> preprocess(f1)
        Feature  Col1  Col2
index  feature1  col2  col3
0             a     1   0.1
1             b     2   0.2
2             c     3   0.3

The index looks different, thus we pass ignore_index=True to pandas.concat argument, which converts the resulting index to a pandas.RangeIndex starting from 0.

Upvotes: 2

BENY
BENY

Reputation: 323226

Do small adjust with self-def function

def ff(x):
    x = x.T.reset_index().T
    x.columns = ['feature','col1','col2']
    return x
out = pd.concat([ff(f1),ff(f2),ff(f3)]).reset_index(drop=True)

out
Out[96]: 
     feature  col1  col2
0   feature1  col2  col3
1          a     1   0.1
2          b     2   0.2
3          c     3   0.3
4   feature2  col2  col3
5          x     4   0.4
6          y     5   0.5
7          z     6   0.6
8   feature2  col2  col3
9          i     7   0.7
10         j     8   0.8
11         k     9   0.9

Upvotes: 2

Related Questions