Akmal Soliev
Akmal Soliev

Reputation: 740

Find value based on a combination of columns

Is there a way to find a value based on the combination of column values?

Example:

df = pd.DataFrame({
    'One' : [np.random.randint(1, 10) for i in range(10)],
    'Two' : [np.random.randint(1, 10) for i in range(10)],
    'Three' : [np.random.randint(1, 10) for i in range(10)],
    'Four' : [np.random.randint(1, 10) for i in range(10)],
})

.

In [6]: df
Out[6]:
   One  Two  Three  Four
0    8    1      7     5
1    6    3      3     3
2    4    7      5     2
3    4    2      6     9
4    1    7      1     9
5    9    8      3     8
6    4    8      4     4
7    1    9      7     1
8    4    2      6     4
9    1    3      7     7

I have the following df and I'm interested which of the columns would I need to combine to get number 9. This would be a column based combination, so each row would output a name of columns that were combined to find 9.

and etc...

NOTE: DataFrame should stay immutabile.

Thanks for any future suggestions or help.

Upvotes: 0

Views: 309

Answers (2)

Tbaki
Tbaki

Reputation: 1003

from itertools import combinations
def find_9(x, v=9):
    c = combinations(x.index, 2)
    columns = []
    for i,j in c:
        if (x[i]+x[j]) == v:
            columns.append([i,j])
    return columns if columns else False

df.apply(lambda x: find_9(x),axis= 1)

ouput:

0                               [[One, Two]]
1    [[One, Two], [One, Three], [One, Four]]
2                [[One, Three], [Two, Four]]
3                                      False
4                                      False
5                                      False
6                                      False
7                                      False
8                                      False
9                                      False

Upvotes: 0

ThePyGuy
ThePyGuy

Reputation: 18416

First get all the combinations of column names for the dataframe, you can use itertools.combinations for it, then create a function that will calculate the sum for each of the combination of column names, and store such combinations in temporary list if the sum equals the required sum, finally return the list of column combinations, and apply this function on dataframe, for axis=1

import itertools
cols = [j for i in [[list(col)
                     for col in (itertools.combinations(df.columns, i))]
                    for i in range(1, df.shape[1] + 1)]
        for j in i]
def getSubArray(row, sum_=9):
    result=[]
    for col in cols:
        if row.loc[col].sum()==sum_:
            result.append(col)
    return result 
result = df.apply(getSubArray, axis=1)

OUTPUT:

0                                                   [[One, Two]]
1    [[One, Two], [One, Three], [One, Four], [Two, Three, Four]]
2                                    [[One, Three], [Two, Four]]
3                                                       [[Four]]
4                                    [[Four], [One, Two, Three]]
5                                                        [[One]]
6                                                             []
7                                    [[Two], [One, Three, Four]]
8                                                             []
9                                                             []
dtype: object

You can pass any value for required sum to getSubArray from .apply call, like: df.apply(getSubArray, axis=1, sum_=24)

Upvotes: 3

Related Questions