Reputation: 7411
I have a dataframe that has the shape like this:
PX_LAST PX_OPEN PX_CLOSE ticker source timestamp
0 1 2 3 A LSE 20180101
1 4 5 6 A LSE 20180102
1 7 8 9 B LSE 20180101
1 10 11 12 B LSE 20180102
....
I want to massage it to the following format:
A B
LSE LSE
PX_LAST, PX_CLOSE, PX_OPEN PX_LAST, PX_CLOSE, PX_OPEN
timestamp
20180101 1 2 3 7 8 9
20180102 4 5 6 10 11 12
....
I tried to first use set_index
to set the ticker and source columns to the row index and use unstack
to push them on to the column axis which does seem to work
df.set_index(['timestamp', 'ticker', 'source'])
.unstack(level=[1,2])
.swaplevel(0,1,axis=1)
.swaplevel(1,2,axis=1)
This does the trick, but has two problems: 1) it is very verbose, with all the swaplevel call we need to do to get the columns into the right shape. 2) It seems it does not do the grouping I wish it does, i.e the result I get is like this:
LSE LSE LSE LSE ...
PX_LAST PX_LAST PX_CLOSE PX_CLOSE ...
timestamp
20180101 1 7 2 8 ...
20180102 4 8 5 11 ...
Is there a cleaner way to do this so I can get the format I want?
Upvotes: 2
Views: 1121
Reputation: 31011
My proposition is to change your solution the following way:
Step 1: df.set_index(['timestamp', 'ticker', 'source']).unstack([1, 2])
,
just as you did.
It leaves columns in the following shape:
PX_LAST PX_OPEN PX_CLOSE
ticker A B A B A B
source LSE LSE LSE LSE LSE LSE
(and timestamp
as the index).
Step 2: reorder_levels([1, 2, 0], axis=1)
, instead of your 2
swaplevel
instructions.
It leaves columns as:
ticker A B A B A B
source LSE LSE LSE LSE LSE LSE
PX_LAST PX_LAST PX_OPEN PX_OPEN PX_CLOSE PX_CLOSE
And the last step is sort_index(axis=1, level=[0,1], sort_remaining=False)
Note that you sort only level 0 and 1, so the order in the last level remains unchanged (PX_LAST, PX_OPEN, PX_CLOSE).
So the whole script (i.e. a single instruction) is:
df2 = df.set_index(['timestamp', 'ticker', 'source']).unstack([1, 2])\
.reorder_levels([1, 2, 0], axis=1)\
.sort_index(axis=1, level=[0,1], sort_remaining=False)
When you print the result, you will get:
ticker A B
source LSE LSE
PX_LAST PX_OPEN PX_CLOSE PX_LAST PX_OPEN PX_CLOSE
timestamp
20180101 1 2 3 7 8 9
20180102 4 5 6 10 11 12
Upvotes: 0
Reputation: 863291
Your solution should be a bit changed - with order of columns in set_index
, omit second swaplevel
and added sort_index
:
df = (df.set_index(['timestamp', 'source', 'ticker'])
.unstack(level=[1,2])
.swaplevel(0,2,axis=1)
.sort_index(axis=1)
)
print (df)
ticker A B
source LSE LSE
PX_CLOSE PX_LAST PX_OPEN PX_CLOSE PX_LAST PX_OPEN
timestamp
20180101 3 1 2 9 7 8
20180102 6 4 5 12 10 11
Upvotes: 1
Reputation: 402862
One option is melt
, set_index
and unstack
:
u = df.melt(['ticker', 'source', 'timestamp'])
(u.set_index(u.columns.difference({'value'}).tolist())['value']
.unstack([1, 0, -1])
.sort_index(axis=1))
ticker A B
source LSE LSE
variable PX_CLOSE PX_LAST PX_OPEN PX_CLOSE PX_LAST PX_OPEN
timestamp
20180101 3 1 2 9 7 8
20180102 6 4 5 12 10 11
Or melt
, and pivot_table
:
u = df.melt(['ticker', 'source', 'timestamp'])
u.pivot_table(index='timestamp',
columns=['ticker','source','variable'],
values='value')
ticker A B
source LSE LSE
variable PX_CLOSE PX_LAST PX_OPEN PX_CLOSE PX_LAST PX_OPEN
timestamp
20180101 3 1 2 9 7 8
20180102 6 4 5 12 10 11
Upvotes: 3