Reputation: 581
I have a list of throughput values in one column of a pandas DataFrame. I want to calculate a value's variation from a threshold as a percent of that threshold.
So if I have thresholds at 2 and 7. I'd want to calculate the minimum of the following functions.
(df.throughput - 2)/2
(df.throughput - 7)/7
x throughtput
1 3
4 4
7 9
I tried using the following to create a new column, but I keep getting an error. I fell like there's something really obvious i'm missing here.
df['pct'] = np.min( (df.throughput-2)/2, (df.throughput - 7)/7 )
df['pct'] = np.min( (df['throughput']-2)/2, (df['throughput'] - 7)/7 )
'Series' objects are mutable, thus they cannot be hashed
Upvotes: 1
Views: 824
Reputation: 863731
You can create new Series
, compare them and use numpy.where
for new column:
a = (df['throughtput'] - 2)/2
b = (df['throughtput'] - 7)/7
df['pct'] = np.where(a < b, a, b)
print (df)
x throughtput pct
0 1 3 -0.571429
1 4 4 -0.428571
2 7 9 0.285714
Solution with concat
and DataFrame.min
:
a = (df['throughtput'] - 2)/2
b = (df['throughtput'] - 7)/7
df['pct'] = pd.concat([a,b], axis=1).min(axis=1)
print (df)
x throughtput pct
0 1 3 -0.571429
1 4 4 -0.428571
2 7 9 0.285714
Or create 2d array by numpy.column_stack
and get min by numpy.amin
:
a = (df['throughtput'] - 2)/2
b = (df['throughtput'] - 7)/7
df['pct'] = np.amin(np.column_stack([a,b]), axis=1)
print (df)
x throughtput pct
0 1 3 -0.571429
1 4 4 -0.428571
2 7 9 0.285714
a = (df['throughtput'].values - 2)/2
b = (df['throughtput'].values - 7)/7
df['pct'] = np.amin(np.column_stack([a,b]), axis=1)
print (df)
x throughtput pct
0 1 3 -0.571429
1 4 4 -0.428571
2 7 9 0.285714
Timings:
N = 1000000
#N = 10
df = pd.DataFrame({'x': np.random.randint(10,size=N),
'throughtput':np.random.randint(10,size=N)})
print (df)
In [50]: %%timeit
...: a = (df['throughtput'] - 2)/2
...: b = (df['throughtput'] - 7)/7
...: df['pct'] = np.where(a < b, a, b)
...:
10 loops, best of 3: 21.1 ms per loop
In [51]: %%timeit
...: a = (df['throughtput'] - 2)/2
...: b = (df['throughtput'] - 7)/7
...: df['pct'] = pd.concat([a,b], axis=1).min(axis=1)
...:
10 loops, best of 3: 56.4 ms per loop
In [52]: %%timeit
...: a = (df['throughtput'] - 2)/2
...: b = (df['throughtput'] - 7)/7
...: df['pct'] = np.amin(np.column_stack([a,b]), axis=1)
...:
10 loops, best of 3: 35.1 ms per loop
In [53]: %%timeit
...: a = (df['throughtput'].values - 2)/2
...: b = (df['throughtput'].values - 7)/7
...: df['pct'] = np.amin(np.column_stack([a,b]), axis=1)
...:
10 loops, best of 3: 38.5 ms per loop
Another answer by Tiny.D:
In [54]: %%timeit
...: df['cal_1'] = (df.throughtput - 2)/2
...: df['cal_2'] = (df.throughtput - 7)/7
...: df['pct'] = df[['cal_1','cal_2']].min(axis=1)
...: df[['x','throughtput','pct']]
...:
10 loops, best of 3: 73.7 ms per loop
In [55]: %%timeit
...: df['pct']=[min(i,j) for i,j in (zip((df.throughtput - 2)/2,(df.throughtput - 7)/7))]
...:
1 loop, best of 3: 435 ms per loop
Upvotes: 1
Reputation: 6556
Try this:
df['pct'] = [min(i,j) for i,j in (zip((df.throughput - 2)/2,(df.throughput - 7)/7))]
df
output will be:
throughput x pct
0 3 1 -0.571429
1 4 4 -0.428571
2 9 7 0.285714
or:
df['cal_1'] = (df.throughput - 2)/2
df['cal_2'] = (df.throughput - 7)/7
df['pct'] = df[['cal_1','cal_2']].min(axis=1)
df[['x','throughput','pct']]
output will be:
x throughput pct
0 1 3 -0.571429
1 4 4 -0.428571
2 7 9 0.285714
Upvotes: 0