Reputation: 261
For an analysis I start of with a dataframe, that looks like this (but is much larger):
ID1 ID2 type Number
0 IE345 E90 I 38
1 IE345 E92 E 26
2 IE345 E93 E 21
3 IE345 E95 R 9
4 IE346 E94 I 41
7 IE346 BLK E 1
I would like to unpivot the following dataframe:
Df1 = pd.pivot_table(loads, values=['Number'], index = ['ID1', 'ID2'], columns=['Type'], margins=True, aggfunc=[sum] , fill_value=0)
Df1:
sum
Number
type B D E I L R All
ID1 ID2
IE345 E90 0 0 0 38 0 0 38
E92 0 0 26 0 0 0 26
E93 0 0 21 0 0 0 21
E95 0 0 0 0 0 9 9
IE346 E94 0 0 0 41 0 0 41
BLK 0 0 1 0 0 0 1
Into:
Df1:
ID1 ID2 B D E I L R All
IE345 E90 0 0 0 38 0 0 38
IE345 E92 0 0 26 0 0 0 26
IE345 E93 0 0 21 0 0 0 21
IE345 E95 0 0 0 0 0 9 9
IE346 E94 0 0 0 41 0 0 41
IE346 BLK 0 0 1 0 0 0 1
It seems that pandas.melt is what i am looking for, but cant manage to make it work.
Afterwards I would like to add the column name of highest value of a row, but without the margins of course:
ID1 ID2 B D E I L R All Max
IE345 E90 0 0 0 38 0 0 38 I
IE345 E92 0 0 26 0 0 0 26 E
IE345 E93 0 0 21 0 6 0 27 E
IE345 E95 0 0 0 0 0 9 9 R
IE345 E94 0 0 0 41 0 0 41 I
IE345 BLK 0 0 1 0 1 0 2 E
For the max value, I used:
df['Max'] = df.idxmax(axis=1, skipna=True)
but this takes the All with it unfortunately. Any ideas on how I can achieve my goal the most efficient way?
!!!EDIT!!!!
For the first part, I constructed a solution which returns exacte a unpivot pivot table! See here the code:
df.columns = df.columns.get_level_values('Type')
df.reset_index(inplace=True)
Now I tried the solution of Vmg for the Max value but unfortunately it returns:
ValueError: could not convert string to float:
Any ideas on how to fix this?
Upvotes: 2
Views: 3785
Reputation: 210852
As @vmg has already said idxmax
will do the trick:
import io
import StringIO # for Python 2.X
import pandas as pd
data = """\
id1 id2 type number
0 IE345 E90 I 38
1 IE345 E92 E 26
2 IE345 E93 E 21
3 IE345 E95 R 9
4 IE346 E94 I 41
7 IE346 BLK E 1
"""
#loads = pd.read_csv(io.StringIO(data), sep='\s+', index_col=0) # for Python 3.X
loads = pd.read_csv(StringIO.StringIO(data), sep='\s+', index_col=0) # for Python 2.X
# **** interesting part starts here ****
# save all unique types, we will use it later
types = loads.type.unique()
df = pd.pivot_table(loads, values=['number'],
index = ['id1', 'id2'],
columns=['type'], margins=True,
aggfunc='sum', fill_value=0) \
.reset_index()
# reset column names
df.columns = [c[1] if c[1] else c[0] for c in df.columns.tolist()]
df['max'] = df[types].idxmax(axis=1)
Output:
In [266]: df
Out[266]:
id1 id2 E I R All max
0 IE345 E90 0.0 38.0 0.0 38.0 I
1 IE345 E92 26.0 0.0 0.0 26.0 E
2 IE345 E93 21.0 0.0 0.0 21.0 E
3 IE345 E95 0.0 0.0 9.0 9.0 R
4 IE346 BLK 1.0 0.0 0.0 1.0 E
5 IE346 E94 0.0 41.0 0.0 41.0 I
6 All 48.0 79.0 9.0 136.0 I
Upvotes: 2
Reputation: 4326
The first transformation you describe doesn't seem to be necessary, as it seems you still want ID1
and ID2
to be indexes. The first representation is just how pandas shows multi indexes, without repeating the first index for every entry in the second.
Your problem at hand, of applying idxmax
without all can be achieved with:
proef['Dominant'] = proef.iloc[:,:-1].idxmax(axis=1, skipna=True)
where iloc[:,:-1]
just means you're disregarding the rightmost column.
Upvotes: 0