snowballtrader
snowballtrader

Reputation: 43

Find optimal combination within dataframe which gives maximum overall average using python pandas

Suppose there is a data frame and statistics as following table:

Time A B C D E Combination A~E
1 10 20 5 8 -8 7.00
2 20 -3 6 -5 10 5.60
3 -5 40 5 4 20 12.80
4 6 -8 2 18 5 4.60
5 8 -1 -4 -5 4 0.40
Average 7.80 9.60 2.80 4.00 6.20 6.08
Standard Deviation 8.01 17.96 3.66 8.65 9.09 4.02
Ratio (Average/S.D) 0.97 0.53 0.77 0.46 0.68 1.51

According to the above table, combination of all A~E give the overall average of 6.08, standard deviation of 4.02 and the ratio of 1.51.

By removing some of the columns A ~ E, maybe I can maximize the average, minimize the standard deviation, or doing the both (increasing the ratio).

(*Meaning that I will remove the entire values in the column, not individual value in each row)

Is there any way I can find out the "best combination" of the columns from A to E which give the:

A) Maximum average

B) Minimum standard deviation

C) Maximum ratio of A/B?

Desired answer might be something like: A, B, C A, B, E A, B, C, D or even A, B, C, D, E

Upvotes: 1

Views: 447

Answers (1)

keramat
keramat

Reputation: 4543

For finding the best matching subset of each row you can create all possible subsets then find the max/min as you wish. For example the following code compute the best cols for the first condition:

import itertools
import numpy as np

def com(row):
    s = []
    for i in range(2, len(row)+1):
       s.extend([x for x in itertools.combinations(row.values, i)])
    vals = s[np.argmax([np.mean(x) for x in s])]
    return row[row.isin(vals)].index.values

df.apply(com, axis = 1)

for the first two rows the result is as follows:

1    [A, B]
2    [A, E]
dtype: object

Based on the comment this may helps:

string = '''Time    A   B   C   D   E
Average 7.80    9.60    2.80    4.00    6.20
Standard Deviation  8.01    17.96   3.66    8.65    9.09
Ratio (Average/S.D) 0.97    0.53    0.77    0.46    0.68'''
data = np.array([x.split('  ') for x in string.split('\n')])
df = pd.DataFrame(data[1:,1:], columns = data[0,1:], index = data[1:,0])
df = df.astype(float)

df.apply(com, axis = 1)

Output:

Average                [A, B]
Standard Deviation     [B, E]
Ratio (Average/S.D)    [A, C]
dtype: object

You can choose the best based on your goal.

Upvotes: 2

Related Questions