Reputation: 732
I have a pandas DataFrame like this
year id1 id2 jan jan1 jan2 feb feb1 feb2 mar mar1 mar2 ....
2018 01 10 3 30 31 2 23 25 7 52 53 ....
2018 01 20 ....
2018 02 10 ....
2018 02 20 ....
and I need this format
year month id1 id2 val val1 val2
2018 01 01 10 3 30 31
2018 02 01 10 2 23 25
2018 03 01 10 7 52 53
..........
As you can see, I have 3 values for each month, and I only add one column assigned to the month with 3 columns for the values. If it were only one column, I think I could use stack
.
I wouldn't have any problem renaming the month columns to 01 01-1 01-2
(for january) or something like that to make it easier.
I'm also thinking on separating the info on 3 different DataFrames to stack
them separately and then merge
the results, or should I melt
it?
Any ideas for achieving this easily?
Upvotes: 1
Views: 2259
Reputation: 732
So I renamed the header columns this way
01 01 01 02 02 02 03 03 03 ...
year id1 id2 val val1 val2 val val1 val2 val val1 val2 ....
2018 01 10 3 30 31 2 23 25 7 52 53 ....
2018 01 20 ....
2018 02 10 ....
2018 02 20 ....
on a file, and opened it this way
df = pd.read_csv('my_file.csv',header=[0, 1], index_col=[0,1,2], skipinitialspace=True, tupleize_cols=True)
df.columns = pd.MultiIndex.from_tuples(df.columns)
then, I actually only needed to stack
it on level 0
df = df.stack(level=0)
and add the titles
df.index.names = ['year','id1','id2','month']
df = df.reset_index()
Upvotes: 0
Reputation: 323266
using reshape
and stack
pd.DataFrame(df.set_index(['year','id1','id2']).values.reshape(4,3,3).tolist(),
index=df.set_index(['year','id1','id2']).index,
columns=[1,2,3])\
.stack().apply(pd.Series).reset_index().rename(columns={'level_3':'month'})
Out[261]:
year id1 id2 month 0 1 2
0 2018 1 10 1 3 30 31
1 2018 1 10 2 2 23 25
2 2018 1 10 3 7 52 53
3 2018 1 20 1 3 30 31
4 2018 1 20 2 2 23 25
5 2018 1 20 3 7 52 53
6 2018 2 10 1 3 30 31
7 2018 2 10 2 2 23 25
8 2018 2 10 3 7 52 53
9 2018 2 20 1 3 30 31
10 2018 2 20 2 2 23 25
11 2018 2 20 3 7 52 53
Upvotes: 1