Reputation: 35
I have the following dataframe (sim_2005):
Date ELEM1 ELEM2 ... ELEM1133
2005-01-01 0.021 2.455 ... 345.2
2005-01-02 0.321 2.331 ... 355.1
... ... ... ... ...
2005-12-31 0.789 3.456 ... 459.9
[365 rows x 1133 columns]
I need to import this into a (postgresql) Database with the following format:
ID Date ELEM Value
1 2005-01-01 ELEM1 0.021
2 2005-01-02 ELEM1 0.321
...
366 2005-01-01 ELEM2 2.455
367 2005-01-02 ELEM2 2.331
...
402983 2005-01-01 ELEM1133 345.2
402984 2005-01-02 ELEM1133 355.1
...
403348 2005-12-31 ELEM1133 459.9
I'm trying my best with iterating over the columns and concatenating it step by step,
for column in sim_2005:
sim_concat = pd.concat([pd.DataFrame(sim_2005.columns.values), sim_2005.ix[:, column]], ignore_index=True)
which produces by far not what I'm looking for.
I also tried pandas pivot function, groupby, etc... but I guess my understanding of pandas and python in general is not sufficient for this transformation. Any help?
Edit:
Thanks, pd.melt was the answer!
sim_2005_melted = pd.melt(sim_2005, id_vars=sim_2005.index.name, value_vars=list(sim_2005.columns.values), var_name='ELEM', value_name='Q_sim').sort(columns='Date')
Results in:
ID Date ELEM Q_sim
1 NaN ELEM1 0.021
2 NaN ELEM1 0.321
...
366 NaN ELEM2 2.455
367 NaN ELEM2 2.331
...
402983 NaN ELEM1133 345.2
402984 NaN ELEM1133 355.1
For some reason the datetime index is not transported over, but the overall structure is exactly how I wanted it!
Upvotes: 0
Views: 1300
Reputation: 76297
It looks like you're doing a "wide-to-narrow" transformation, in pandas
terminology. One way to do this is with melt
.
In this case, you could do something like:
pd.melt(df, id_vars=['Date'], value_vars=['ELEM1', 'ELEM2']).sort(columns='Date')
for a dataframe with two "wide" columns ['ELEM1', 'ELEM2']
. For your ~1K columns in the example, you'd obviously not write them by hand, and rather create value_vars
using some sort of list comprehension.
Example
df = pd.DataFrame({
'Date': range(4),
'ELEM1': range(1, 5),
'ELEM2': range(11, 15),
'ELEM2': range(-5, -1)
})
>> df
Date ELEM1 ELEM2
0 0 1 -5
1 1 2 -4
2 2 3 -3
3 3 4 -2
>> pd.melt(df, id_vars=['Date'], value_vars=['ELEM1', 'ELEM2']).sort(columns='Date')
Date variable value
0 0 ELEM1 1
4 0 ELEM2 -5
1 1 ELEM1 2
5 1 ELEM2 -4
2 2 ELEM1 3
6 2 ELEM2 -3
3 3 ELEM1 4
7 3 ELEM2 -2
Upvotes: 1