Reputation: 2903
Was wondering if there is a more efficient way of dividing multiple columns a certain column. For example say I have:
prev open close volume
20.77 20.87 19.87 962816
19.87 19.89 19.56 668076
19.56 19.96 20.1 578987
20.1 20.4 20.53 418597
And i would like to get:
prev open close volume
20.77 1.0048 0.9567 962816
19.87 1.0010 0.9844 668076
19.56 1.0204 1.0276 578987
20.1 1.0149 1.0214 418597
Basically, columns 'open' and 'close' have been divided by the value from column 'prev.'
I was able to do this by
df['open'] = list(map(lambda x,y: x/y, df['open'],df['prev']))
df['close'] = list(map(lambda x,y: x/y, df['close'],df['prev']))
I was wondering if there is a simpler way? Especially if there are like 10 columns to be divided by the same value anyways?
Upvotes: 3
Views: 28337
Reputation: 221534
For performance, I would suggest using the underlying array data and array-slicing
as the two columns to be modified come in sequence to use view into it -
a = df.values
df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]
To eloborate a bit more on the array-slicing part, with a[:,[1,2]]
would have forced a copy there and would have slowed it down. a[:,[1,2]]
on the dataframe side is equivalent to df[['open','close']]
and that I am guessing is slowing things down too. df.iloc[:,1:3]
is thus improving upon it.
Sample run -
In [64]: df
Out[64]:
prev open close volume
0 20.77 20.87 19.87 962816
1 19.87 19.89 19.56 668076
2 19.56 19.96 20.10 578987
3 20.10 20.40 20.53 418597
In [65]: a = df.values
...: df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]
...:
In [66]: df
Out[66]:
prev open close volume
0 20.77 1.004815 0.956668 962816
1 19.87 1.001007 0.984399 668076
2 19.56 1.020450 1.027607 578987
3 20.10 1.014925 1.021393 418597
Runtime test
Approaches -
def numpy_app(df): # Proposed in this post
a = df.values
df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]
return df
def pandas_app1(df): # @Scott Boston's soln
df[['open','close']] = df[['open','close']].div(df['prev'].values,axis=0)
return df
Timings -
In [44]: data = np.random.randint(15, 25, (100000,4)).astype(float)
...: df1 = pd.DataFrame(data, columns=(('prev','open','close','volume')))
...: df2 = df1.copy()
...:
In [45]: %timeit pandas_app1(df1)
...: %timeit numpy_app(df2)
...:
100 loops, best of 3: 2.68 ms per loop
1000 loops, best of 3: 885 µs per loop
Upvotes: 5
Reputation: 153460
df2[['open','close']] = df2[['open','close']].div(df2['prev'].values,axis=0)
Output:
prev open close volume
0 20.77 1.004815 0.956668 962816
1 19.87 1.001007 0.984399 668076
2 19.56 1.020450 1.027607 578987
3 20.10 1.014925 1.021393 418597
Upvotes: 9
Reputation: 57033
columns_to_divide = ['open', 'close']
df[columns_to_divide] = df[columns_to_divide] / df['prev']
Upvotes: 6