Reputation: 575

DataFrame pairs of columns division

I have a DataFrame and want to get divisions of pairs of columns like below:

df = pd.DataFrame({
    'a1': np.random.randint(1, 1000, 1000),
    'a2': np.random.randint(1, 1000, 1000),
    'b1': np.random.randint(1, 1000, 1000),
    'b2': np.random.randint(1, 1000, 1000),
    'c1': np.random.randint(1, 1000, 1000),
    'c2': np.random.randint(1, 1000, 1000),
})
df['a'] = df['a2'] / df['a1']
df['b'] = df['b2'] / df['b1']
df['c'] = df['c2'] / df['c1']

I want to combine the last three lines into one like:

df[['a', 'b', 'c']] = df[['a2', 'b2', 'c2']] / df[['a1', 'b1', 'c1']]

but I only get an error of ValueError: Columns must be same length as key. If I just simply print(df[['a2', 'b2', 'c2']] / df[['a1', 'b1', 'c1']]), I will only get a DataFrame with NaNs of shape (1000, 6).

==== Edit

Now I know why my original one-line code doesn't work. Actually, the arithmetic operations of two DataFrames will be conducted between the columns with same labels, while those columns without same label in another DataFrame will generate NaNs. The result DataFrame will have the union() of the columns of the two operating DataFrames. That's why my original solution will give an ValueError and the div will generate NaNs.

Following example will be helpful to explain:

df1 = pd.DataFrame(data={'A':[1,2], 'B':[3,4], 'C':[5,6], 'D':[8,9]})
df2 = pd.DataFrame(data={'A':[11,12], 'B':[13,14], 'C':[15,16], 'D':[18,19]})
df1[['A', 'B']] / df2[['A', 'B']]
Out[130]: 
          A         B
0  0.090909  0.230769
1  0.166667  0.285714

df1[['A', 'B']] / df2[['C', 'D']]
Out[131]: 
    A   B   C   D
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN

df1[['A', 'B']] + df2[['A', 'C']]
Out[132]: 
    A   B   C
0  12 NaN NaN
1  14 NaN NaN

Upvotes: 6

Answers (3)

sammywemmy

Reputation: 28644

MultiIndex comes in handy here, since Pandas is always aligned first on the index(columns is an index as well) before any computations.

Using sample data below:

df = pd.DataFrame({'a_1':range(2,10,2),
            'a_2': range(4, 20, 4),
           'b_1': range(3, 15,3),
           'b_2': range(6,30,6),
           'c_1': range(5, 25, 5),
           'c_2': range(10, 50, 10)})

df
   a_1  a_2  b_1  b_2  c_1  c_2
0    2    4    3    6    5   10
1    4    8    6   12   10   20
2    6   12    9   18   15   30
3    8   16   12   24   20   40

split the columns into a MultiIndex:

temp = df.copy()
temp.columns = temp.columns.str.split('_', expand = True).swaplevel()

temp

 1   2   1   2   1   2
   a   a   b   b   c   c
0  2   4   3   6   5  10
1  4   8   6  12  10  20
2  6  12   9  18  15  30
3  8  16  12  24  20  40

In this form, you can simply select 2 divided by 1:

df['a','b','c']] = temp['2'] / temp['1']

This gives the same values as :

df[['a_2', 'b_2', 'c_2']].values / df[['a_1', 'b_1', 'c_1']].values

Imagine however, that you have lots of columns, you do not need to worry about the pairing, as the MultiIndex form takes care of that, with Pandas doing the alignment before computation.

Numpy is going to be faster - @MuhammadHassan's answer fits nicely; this is just to show how MultiIndex has its place and its uses in the right circumstances.

Upvotes: 1

jacob

Reputation: 200

You could do the following as simple workaround:

df['a'], df['b'], df['c'] = (df['a2'] / df['a1'], df['b2'] / df['b1'], df['c2'] / df['c1'])

Although I think that using the assign method would make your code much more readable:

df.assign(a=lambda x: x['a2'] / x['a1'], 
          b=lambda x: x['b2'] / x['b1'], 
          c=lambda x: x['c2'] / x['c1'])

Upvotes: 1

Muhammad Hassan

Reputation: 4229

You can use:

df[['a', 'b', 'c']] = df[['a2', 'b2', 'c2']].values / df[['a1', 'b1', 'c1']].values

OUTPUT

      a1   a2   b1   b2   c1   c2          a         b         c
0    864  214  551  761  174  111   0.247685  1.381125  0.637931
1    820  971  379   79  190  587   1.184146  0.208443  3.089474
2    305  154  519  378  567  186   0.504918  0.728324  0.328042
3     51  505  303  417  959  326   9.901961  1.376238  0.339937
4     84  531  625  899  248  905   6.321429  1.438400  3.649194
..   ...  ...  ...  ...  ...  ...        ...       ...       ...
995  302  695  790  777  896  975   2.301325  0.983544  1.088170
996   24  308  462  316  388  784  12.833333  0.683983  2.020619
997  135  286  359  752  282  283   2.118519  2.094708  1.003546
998  695   45  832  936  811  404   0.064748  1.125000  0.498150
999  809  454  971  335  366  884   0.561187  0.345005  2.415301

Upvotes: 3

DataFrame pairs of columns division

Answers (3)

Related Questions