mezz
mezz

Reputation: 437

Subtracting two columns in pandas with lists to create a cummalative column

Dataframe consists of set x which is a universal set and subset column contains of some subsets. I want to choose the subsets with the highest ratios until I covered the full set x.

Uncovered = setx - subset

This is how my dataframe look like in pandas :

   ratio                  set x        subset        uncovered
2   2.00  [1, 3, 6, 8, 9, 0, 7]  [8, 3, 6, 1]        [0, 9, 7]
0   1.50  [1, 3, 6, 8, 9, 0, 7]     [1, 3, 6]     [0, 8, 9, 7]
1   1.00  [1, 3, 6, 8, 9, 0, 7]        [9, 0]  [8, 1, 3, 6, 7]
3   0.75  [1, 3, 6, 8, 9, 0, 7]     [1, 3, 7]     [0, 8, 6, 9]

I want to create another column with the subtraction of set x with cumulative of uncovered column until i get a empty list.

I tried the below code

p['tt']=list(p['set x']-p['subset'])

Error Message :

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) /Applications/anaconda/lib/python3.5/site-packages/pandas/core/ops.py in na_op(x, y) 581 result = expressions.evaluate(op, str_rep, x, y, --> 582 raise_on_error=True, **eval_kwargs) 583 except TypeError:

/Applications/anaconda/lib/python3.5/site-packages/pandas/computation/expressions.py in evaluate(op, op_str, a, b, raise_on_error, use_numexpr, **eval_kwargs) 208 return _evaluate(op, op_str, a, b, raise_on_error=raise_on_error, --> 209 **eval_kwargs) 210 return _evaluate_standard(op, op_str, a, b, raise_on_error=raise_on_error)

/Applications/anaconda/lib/python3.5/site-packages/pandas/computation/expressions.py in _evaluate_numexpr(op, op_str, a, b, raise_on_error, truediv, reversed, **eval_kwargs) 119 if result is None: --> 120 result = _evaluate_standard(op, op_str, a, b, raise_on_error) 121

/Applications/anaconda/lib/python3.5/site-packages/pandas/computation/expressions.py in _evaluate_standard(op, op_str, a, b, raise_on_error, **eval_kwargs) 61 _store_test_result(False) ---> 62 return op(a, b) 63

TypeError: unsupported operand type(s) for -: 'list' and 'list'

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last) in () ----> 1 p['tt']=list(p['set x']-p['subset'])

/Applications/anaconda/lib/python3.5/site-packages/pandas/core/ops.py in wrapper(left, right, name, na_op) 639 rvalues = algos.take_1d(rvalues, ridx) 640 --> 641 arr = na_op(lvalues, rvalues) 642 643 return left._constructor(wrap_results(arr), index=index,

/Applications/anaconda/lib/python3.5/site-packages/pandas/core/ops.py in na_op(x, y) 586 result = np.empty(x.size, dtype=dtype) 587 mask = notnull(x) & notnull(y) --> 588 result[mask] = op(x[mask], _values_from_object(y[mask])) 589 elif isinstance(x, np.ndarray): 590 result = np.empty(len(x), dtype=x.dtype)

TypeError: unsupported operand type(s) for -: 'list' and 'list'

Upvotes: 0

Views: 1010

Answers (1)

Tammo Heeren
Tammo Heeren

Reputation: 2104

This should work for you:

import pandas as pd

#    ratio                  set x        subset        uncovered
# 2   2.00  [1, 3, 6, 8, 9, 0, 7]  [8, 3, 6, 1]        [0, 9, 7]
# 0   1.50  [1, 3, 6, 8, 9, 0, 7]     [1, 3, 6]     [0, 8, 9, 7]
# 1   1.00  [1, 3, 6, 8, 9, 0, 7]        [9, 0]  [8, 1, 3, 6, 7]
# 3   0.75  [1, 3, 6, 8, 9, 0, 7]     [1, 3, 7]     [0, 8, 6, 9]

p = pd.DataFrame(
    [
        {'set x': [1, 3, 6, 8, 9, 0, 7], 'subset': [1, 3, 6]},
        {'set x': [1, 3, 6, 8, 9, 0, 7], 'subset': [9, 0]},
        {'set x': [1, 3, 6, 8, 9, 0, 7], 'subset': [8, 3, 6, 1]},
        {'set x': [1, 3, 6, 8, 9, 0, 7], 'subset': [1, 3, 7]},
    ])


def set_operation(x):
    return list(set(x['set x']) - set(x['subset']))

p['tt'] = p.apply(set_operation, axis=1)

Result is:

                   set x        subset               tt
0  [1, 3, 6, 8, 9, 0, 7]     [1, 3, 6]     [0, 8, 9, 7]
1  [1, 3, 6, 8, 9, 0, 7]        [9, 0]  [8, 1, 3, 6, 7]
2  [1, 3, 6, 8, 9, 0, 7]  [8, 3, 6, 1]        [0, 9, 7]
3  [1, 3, 6, 8, 9, 0, 7]     [1, 3, 7]     [0, 8, 9, 6]

Upvotes: 0

Related Questions