Reputation: 30424
This is a real question, though it may seem to be splitting hairs at first glance. Basically I want to treat a series as a column rather than a row, which I think makes intuitive sense even if series can not technically be divided into rows and columns (?) whereas 1d numpy arrays can. The example:
df = pd.DataFrame( { 'a' : [5,3,1],
'b' : [4,6,2],
'c' : [2,4,9] } )
df['rowsum'] = df.sum(1)
In [31]: df
Out[31]:
a b c rowsum
0 5 4 2 11
1 3 6 4 13
2 1 2 9 12
I just want to get percentages by row (so rows sum to 1). I would like to do this:
df.iloc[:,0:3] / df.rowsum
which works fine in numpy (with reshape) since you can make rowsum a column or row vector. But here I can't reshape the series or use T on df.rowsum. It seems a dataframe can be transposed but not a series. The following works (along with several other solutions). And it can be coded naturally in numpy, but that involves converting to arrays and then back to a dataframe.
In [32]: ( df.iloc[:,0:3].T / df.rowsum ).T
Out[32]:
a b c
0 0.454545 0.363636 0.181818
1 0.230769 0.461538 0.307692
2 0.083333 0.166667 0.750000
I'm sorry if this seems trivial but it's valuable to be able to code in terms of rows and columns in an intuitive way. So the question is merely: can I make a series act like a column vector rather than a row vector?
Also it seems inconsistent that this will work fine on a column.
df.iloc[:,0] / df.rowsum
pandas appears in this case to be dividing (elementwise) two column arrays (on account of the display, even if the row/column distinction is artificial). But when the first part of that expression is changed from a dataframe to series, it seems to effectively go from being a 3x1 to a 1x2. It's like going from a series to a dataframe is an implicit transform operation?
Maybe a better way to think about it:
all( dist.iloc[:,:10].index == dist.rowsum.index )
Out[1526]: True
The indexes line up here, why does pandas seem to employ the index differently for series/series broadcasting than for dataframe/series broadcasting? Or am I just thinking about this completely wrong?!?
Upvotes: 0
Views: 1197
Reputation: 1726
try this
df.apply(lambda x:x/x[3], axis = 1)
a b c rowsum
0 0.454545 0.363636 0.181818 1
1 0.230769 0.461538 0.307692 1
2 0.083333 0.166667 0.750000 1
If you don't need the rowsum column, you can use
df.apply(lambda x:x/sum(x), axis = 1) #with your original dataFrame
Upvotes: 2
Reputation: 1356
Try
df.iloc[:, 0:3].div(df.rowsum, axis=0)
to see if it's what you want.
Upvotes: 1