user1179317
user1179317

Reputation: 2903

Divide certain columns by another column in pandas

Was wondering if there is a more efficient way of dividing multiple columns a certain column. For example say I have:

prev    open    close   volume
20.77   20.87   19.87   962816
19.87   19.89   19.56   668076
19.56   19.96   20.1    578987
20.1    20.4    20.53   418597

And i would like to get:

prev    open    close   volume
20.77   1.0048  0.9567  962816
19.87   1.0010  0.9844  668076
19.56   1.0204  1.0276  578987
20.1    1.0149  1.0214  418597

Basically, columns 'open' and 'close' have been divided by the value from column 'prev.'

I was able to do this by

df['open'] = list(map(lambda x,y: x/y, df['open'],df['prev']))
df['close'] = list(map(lambda x,y: x/y, df['close'],df['prev']))

I was wondering if there is a simpler way? Especially if there are like 10 columns to be divided by the same value anyways?

Upvotes: 3

Views: 28337

Answers (3)

Divakar
Divakar

Reputation: 221534

For performance, I would suggest using the underlying array data and array-slicing as the two columns to be modified come in sequence to use view into it -

a = df.values
df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]

To eloborate a bit more on the array-slicing part, with a[:,[1,2]] would have forced a copy there and would have slowed it down. a[:,[1,2]] on the dataframe side is equivalent to df[['open','close']] and that I am guessing is slowing things down too. df.iloc[:,1:3] is thus improving upon it.

Sample run -

In [64]: df
Out[64]: 
    prev   open  close  volume
0  20.77  20.87  19.87  962816
1  19.87  19.89  19.56  668076
2  19.56  19.96  20.10  578987
3  20.10  20.40  20.53  418597

In [65]: a = df.values
    ...: df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]
    ...: 

In [66]: df
Out[66]: 
    prev      open     close  volume
0  20.77  1.004815  0.956668  962816
1  19.87  1.001007  0.984399  668076
2  19.56  1.020450  1.027607  578987
3  20.10  1.014925  1.021393  418597

Runtime test

Approaches -

def numpy_app(df): # Proposed in this post
    a = df.values
    df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]
    return df

def pandas_app1(df): # @Scott Boston's soln
    df[['open','close']] = df[['open','close']].div(df['prev'].values,axis=0)
    return df

Timings -

In [44]: data = np.random.randint(15, 25, (100000,4)).astype(float)
    ...: df1 = pd.DataFrame(data, columns=(('prev','open','close','volume')))
    ...: df2 = df1.copy()
    ...: 

In [45]: %timeit pandas_app1(df1)
    ...: %timeit numpy_app(df2)
    ...: 
100 loops, best of 3: 2.68 ms per loop
1000 loops, best of 3: 885 µs per loop

Upvotes: 5

Scott Boston
Scott Boston

Reputation: 153460

df2[['open','close']] = df2[['open','close']].div(df2['prev'].values,axis=0)

Output:

    prev      open     close  volume
0  20.77  1.004815  0.956668  962816
1  19.87  1.001007  0.984399  668076
2  19.56  1.020450  1.027607  578987
3  20.10  1.014925  1.021393  418597

Upvotes: 9

DYZ
DYZ

Reputation: 57033

columns_to_divide = ['open', 'close']
df[columns_to_divide] = df[columns_to_divide] / df['prev']

Upvotes: 6

Related Questions