Ignacio Vergara Kausel
Ignacio Vergara Kausel

Reputation: 5996

Pandas pivoting/stacking/reshaping

I'm trying to import data to a pandas DataFrame with columns being date string, label, value. My data looks like the following (just with 4 dates and 5 labels)

from numpy import random
import numpy as np
import pandas as pd

# Creating the data
dates = ("2015-01-01", "2015-01-02", "2015-01-03", "2015-01-04")
values = [random.rand(5) for _ in range(4)]

data = dict(zip(dates,values))

So, the data is a dictionary where the keys are dates, the keys a list of values where the index is the label.

Loading this data structure into a DataFrame

df1 = pd.DataFrame(data)

gives me the dates as columns, the label as index, and the value as the value.

An alternative loading would be

df2 = pd.DataFrame()
df2.from_dict(data, orient='index')

where the dates are index, and columns are labels.

In either of both cases do I manage to do pivoting or stacking to my preferred view.

How should I approach the pivoting/stacking to get the view I want? Or should I change my data structure before loading it into a DataFrame? In particular I'd like to avoid of having to create all the rows of the table beforehand by using a bunch of calls to zip.

Upvotes: 1

Views: 81

Answers (1)

piRSquared
piRSquared

Reputation: 294278

IIUC:

Option 1
pd.DataFrame.stack

pd.DataFrame(data).stack() \
    .rename('value').rename_axis(['label', 'date']).reset_index()

    label        date     value
0       0  2015-01-01  0.345109
1       0  2015-01-02  0.815948
2       0  2015-01-03  0.758709
3       0  2015-01-04  0.461838
4       1  2015-01-01  0.584527
5       1  2015-01-02  0.823529
6       1  2015-01-03  0.714700
7       1  2015-01-04  0.160735
8       2  2015-01-01  0.779006
9       2  2015-01-02  0.721576
10      2  2015-01-03  0.246975
11      2  2015-01-04  0.270491
12      3  2015-01-01  0.465495
13      3  2015-01-02  0.622024
14      3  2015-01-03  0.227865
15      3  2015-01-04  0.638772
16      4  2015-01-01  0.266322
17      4  2015-01-02  0.575298
18      4  2015-01-03  0.335095
19      4  2015-01-04  0.761181

Option 2
comprehension

pd.DataFrame(
    [[i, d, v] for d, l in data.items() for i, v in enumerate(l)],
    columns=['label', 'date', 'value']
)

    label        date     value
0       0  2015-01-01  0.345109
1       1  2015-01-01  0.584527
2       2  2015-01-01  0.779006
3       3  2015-01-01  0.465495
4       4  2015-01-01  0.266322
5       0  2015-01-02  0.815948
6       1  2015-01-02  0.823529
7       2  2015-01-02  0.721576
8       3  2015-01-02  0.622024
9       4  2015-01-02  0.575298
10      0  2015-01-03  0.758709
11      1  2015-01-03  0.714700
12      2  2015-01-03  0.246975
13      3  2015-01-03  0.227865
14      4  2015-01-03  0.335095
15      0  2015-01-04  0.461838
16      1  2015-01-04  0.160735
17      2  2015-01-04  0.270491
18      3  2015-01-04  0.638772
19      4  2015-01-04  0.761181

Upvotes: 2

Related Questions