Reputation: 437
Dataframe consists of set x which is a universal set and subset column contains of some subsets. I want to choose the subsets with the highest ratios until I covered the full set x.
Uncovered = setx - subset
This is how my dataframe look like in pandas :
ratio set x subset uncovered
2 2.00 [1, 3, 6, 8, 9, 0, 7] [8, 3, 6, 1] [0, 9, 7]
0 1.50 [1, 3, 6, 8, 9, 0, 7] [1, 3, 6] [0, 8, 9, 7]
1 1.00 [1, 3, 6, 8, 9, 0, 7] [9, 0] [8, 1, 3, 6, 7]
3 0.75 [1, 3, 6, 8, 9, 0, 7] [1, 3, 7] [0, 8, 6, 9]
I want to create another column with the subtraction of set x with cumulative of uncovered column until i get a empty list.
I tried the below code
p['tt']=list(p['set x']-p['subset'])
Error Message :
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) /Applications/anaconda/lib/python3.5/site-packages/pandas/core/ops.py in na_op(x, y) 581 result = expressions.evaluate(op, str_rep, x, y, --> 582 raise_on_error=True, **eval_kwargs) 583 except TypeError:
/Applications/anaconda/lib/python3.5/site-packages/pandas/computation/expressions.py in evaluate(op, op_str, a, b, raise_on_error, use_numexpr, **eval_kwargs) 208 return _evaluate(op, op_str, a, b, raise_on_error=raise_on_error, --> 209 **eval_kwargs) 210 return _evaluate_standard(op, op_str, a, b, raise_on_error=raise_on_error)
/Applications/anaconda/lib/python3.5/site-packages/pandas/computation/expressions.py in _evaluate_numexpr(op, op_str, a, b, raise_on_error, truediv, reversed, **eval_kwargs) 119 if result is None: --> 120 result = _evaluate_standard(op, op_str, a, b, raise_on_error) 121
/Applications/anaconda/lib/python3.5/site-packages/pandas/computation/expressions.py in _evaluate_standard(op, op_str, a, b, raise_on_error, **eval_kwargs) 61 _store_test_result(False) ---> 62 return op(a, b) 63
TypeError: unsupported operand type(s) for -: 'list' and 'list'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last) in () ----> 1 p['tt']=list(p['set x']-p['subset'])
/Applications/anaconda/lib/python3.5/site-packages/pandas/core/ops.py in wrapper(left, right, name, na_op) 639 rvalues = algos.take_1d(rvalues, ridx) 640 --> 641 arr = na_op(lvalues, rvalues) 642 643 return left._constructor(wrap_results(arr), index=index,
/Applications/anaconda/lib/python3.5/site-packages/pandas/core/ops.py in na_op(x, y) 586 result = np.empty(x.size, dtype=dtype) 587 mask = notnull(x) & notnull(y) --> 588 result[mask] = op(x[mask], _values_from_object(y[mask])) 589 elif isinstance(x, np.ndarray): 590 result = np.empty(len(x), dtype=x.dtype)
TypeError: unsupported operand type(s) for -: 'list' and 'list'
Upvotes: 0
Views: 1010
Reputation: 2104
This should work for you:
import pandas as pd
# ratio set x subset uncovered
# 2 2.00 [1, 3, 6, 8, 9, 0, 7] [8, 3, 6, 1] [0, 9, 7]
# 0 1.50 [1, 3, 6, 8, 9, 0, 7] [1, 3, 6] [0, 8, 9, 7]
# 1 1.00 [1, 3, 6, 8, 9, 0, 7] [9, 0] [8, 1, 3, 6, 7]
# 3 0.75 [1, 3, 6, 8, 9, 0, 7] [1, 3, 7] [0, 8, 6, 9]
p = pd.DataFrame(
[
{'set x': [1, 3, 6, 8, 9, 0, 7], 'subset': [1, 3, 6]},
{'set x': [1, 3, 6, 8, 9, 0, 7], 'subset': [9, 0]},
{'set x': [1, 3, 6, 8, 9, 0, 7], 'subset': [8, 3, 6, 1]},
{'set x': [1, 3, 6, 8, 9, 0, 7], 'subset': [1, 3, 7]},
])
def set_operation(x):
return list(set(x['set x']) - set(x['subset']))
p['tt'] = p.apply(set_operation, axis=1)
Result is:
set x subset tt
0 [1, 3, 6, 8, 9, 0, 7] [1, 3, 6] [0, 8, 9, 7]
1 [1, 3, 6, 8, 9, 0, 7] [9, 0] [8, 1, 3, 6, 7]
2 [1, 3, 6, 8, 9, 0, 7] [8, 3, 6, 1] [0, 9, 7]
3 [1, 3, 6, 8, 9, 0, 7] [1, 3, 7] [0, 8, 9, 6]
Upvotes: 0