Reputation: 575
I have a DataFrame and want to get divisions of pairs of columns like below:
df = pd.DataFrame({
'a1': np.random.randint(1, 1000, 1000),
'a2': np.random.randint(1, 1000, 1000),
'b1': np.random.randint(1, 1000, 1000),
'b2': np.random.randint(1, 1000, 1000),
'c1': np.random.randint(1, 1000, 1000),
'c2': np.random.randint(1, 1000, 1000),
})
df['a'] = df['a2'] / df['a1']
df['b'] = df['b2'] / df['b1']
df['c'] = df['c2'] / df['c1']
I want to combine the last three lines into one like:
df[['a', 'b', 'c']] = df[['a2', 'b2', 'c2']] / df[['a1', 'b1', 'c1']]
but I only get an error of ValueError: Columns must be same length as key
. If I just simply print(df[['a2', 'b2', 'c2']] / df[['a1', 'b1', 'c1']])
, I will only get a DataFrame with NaN
s of shape (1000, 6).
==== Edit
Now I know why my original one-line code doesn't work. Actually, the arithmetic operations of two DataFrames will be conducted between the columns with same labels, while those columns without same label in another DataFrame will generate NaNs. The result DataFrame will have the union() of the columns of the two operating DataFrames. That's why my original solution will give an ValueError
and the div will generate NaN
s.
Following example will be helpful to explain:
df1 = pd.DataFrame(data={'A':[1,2], 'B':[3,4], 'C':[5,6], 'D':[8,9]})
df2 = pd.DataFrame(data={'A':[11,12], 'B':[13,14], 'C':[15,16], 'D':[18,19]})
df1[['A', 'B']] / df2[['A', 'B']]
Out[130]:
A B
0 0.090909 0.230769
1 0.166667 0.285714
df1[['A', 'B']] / df2[['C', 'D']]
Out[131]:
A B C D
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
df1[['A', 'B']] + df2[['A', 'C']]
Out[132]:
A B C
0 12 NaN NaN
1 14 NaN NaN
Upvotes: 6
Views: 319
Reputation: 28644
MultiIndex comes in handy here, since Pandas is always aligned first on the index(columns is an index as well) before any computations.
Using sample data below:
df = pd.DataFrame({'a_1':range(2,10,2),
'a_2': range(4, 20, 4),
'b_1': range(3, 15,3),
'b_2': range(6,30,6),
'c_1': range(5, 25, 5),
'c_2': range(10, 50, 10)})
df
a_1 a_2 b_1 b_2 c_1 c_2
0 2 4 3 6 5 10
1 4 8 6 12 10 20
2 6 12 9 18 15 30
3 8 16 12 24 20 40
split the columns into a MultiIndex:
temp = df.copy()
temp.columns = temp.columns.str.split('_', expand = True).swaplevel()
temp
1 2 1 2 1 2
a a b b c c
0 2 4 3 6 5 10
1 4 8 6 12 10 20
2 6 12 9 18 15 30
3 8 16 12 24 20 40
In this form, you can simply select 2
divided by 1
:
df['a','b','c']] = temp['2'] / temp['1']
This gives the same values as :
df[['a_2', 'b_2', 'c_2']].values / df[['a_1', 'b_1', 'c_1']].values
Imagine however, that you have lots of columns, you do not need to worry about the pairing, as the MultiIndex form takes care of that, with Pandas doing the alignment before computation.
Numpy is going to be faster - @MuhammadHassan's answer fits nicely; this is just to show how MultiIndex has its place and its uses in the right circumstances.
Upvotes: 1
Reputation: 200
You could do the following as simple workaround:
df['a'], df['b'], df['c'] = (df['a2'] / df['a1'], df['b2'] / df['b1'], df['c2'] / df['c1'])
Although I think that using the assign
method would make your code much more readable:
df.assign(a=lambda x: x['a2'] / x['a1'],
b=lambda x: x['b2'] / x['b1'],
c=lambda x: x['c2'] / x['c1'])
Upvotes: 1
Reputation: 4229
You can use:
df[['a', 'b', 'c']] = df[['a2', 'b2', 'c2']].values / df[['a1', 'b1', 'c1']].values
OUTPUT
a1 a2 b1 b2 c1 c2 a b c
0 864 214 551 761 174 111 0.247685 1.381125 0.637931
1 820 971 379 79 190 587 1.184146 0.208443 3.089474
2 305 154 519 378 567 186 0.504918 0.728324 0.328042
3 51 505 303 417 959 326 9.901961 1.376238 0.339937
4 84 531 625 899 248 905 6.321429 1.438400 3.649194
.. ... ... ... ... ... ... ... ... ...
995 302 695 790 777 896 975 2.301325 0.983544 1.088170
996 24 308 462 316 388 784 12.833333 0.683983 2.020619
997 135 286 359 752 282 283 2.118519 2.094708 1.003546
998 695 45 832 936 811 404 0.064748 1.125000 0.498150
999 809 454 971 335 366 884 0.561187 0.345005 2.415301
Upvotes: 3