ThatQuantDude
ThatQuantDude

Reputation: 833

cardinality constraint in portfolio optimisation

I am using cvxpy to work on some simple portfolio optimisation problem. The only constraint I can't get my head around is the cardinality constraint for the number non-zero portfolio holdings. I tried two approaches, a MIP approach and a traditional convex one.

here is some dummy code for a working traditional example.

import numpy as np
import cvxpy as cvx

np.random.seed(12345)
n = 10
k = 6
mu = np.abs(np.random.randn(n, 1))
Sigma = np.random.randn(n, n)
Sigma = Sigma.T.dot(Sigma)
w = cvx.Variable(n)

ret = mu.T*w 
risk = cvx.quad_form(w, Sigma)
objective = cvx.Maximize(ret - risk)

constraints = [cvx.sum_entries(w) == 1, w>= 0, cvx.sum_smallest(w, n-k) >= 0,  cvx.sum_largest(w, k) <=1 ]

prob = cvx.Problem(objective, constraints)  
prob.solve()

print prob.status

output = []
for i in range(len(w.value)):
    output.append(round(w[i].value,2))


print 'Number of non-zero elements : ',sum(1 for i in output if i > 0)

I had the idea to use, sum_smallest and sum_largest (cvxpy manual) my thought was to constraint the smallest n-k entries to 0 and let my target range k sum up to one, I know I can't change the direction of the inequality in order to stay convex, but maybe anyone knows about a clever way of constraining the problem while still keeping it simple.

My second idea was to make this a mixed integer problem, s.th along the lines of

import numpy as np
import cvxpy as cvx

np.random.seed(12345)
n = 10
k = 6
mu = np.abs(np.random.randn(n, 1))
Sigma = np.random.randn(n, n)
Sigma = Sigma.T.dot(Sigma)
w = cvx.Variable(n)

binary = cvx.Bool(n)
integer = cvx.Int(n)

ret = mu.T*w 
risk = cvx.quad_form(w, Sigma)
objective = cvx.Maximize(ret - risk)

constraints = [cvx.sum_entries(w) == 1, w>= 0, cvx.sum_entries(binary) == k ]

prob = cvx.Problem(objective, constraints)  
prob.solve()

print prob.status

output = []
for i in range(len(w.value)):
    output.append(round(w[i].value,2))


print sum(1 for i in output if i > 0)

for i in range(len(w.value)):
    print round(binary[i].value,2)

print output

looking at my binary vector it seems to be doing the right thing but the sum_entries constraint doesn't work, looking into the binary vector values I noticed that 0 isn't 0 it's very small e.g xxe^-20 I assume this will mess things up. Anyone can give me any guidance if this is the right way to go? I can use the standard solvers, as well as Mosek if that helps. I would prefer to have a non MIP implementation as I understand this is a combinatorial problem and will get very slow for larger problems. Ultimately I would like to either constraint on exact number of target holdings or a range e.g. 20-30.

Also the documentation in cvxpy around MIP is very short. thanks

Upvotes: 1

Views: 3622

Answers (2)

Alex
Alex

Reputation: 65

I've had a similar problem where my weights could be negative and did not need to sum to 1 (but still need to be bounded), so I've modified sascha's example to accommodate relaxing these constraints using the CVXpy absolute value function. This should allow for a more general approach to tackling cardinality constraints with MIP

import numpy as np
import cvxpy as cvx

np.random.seed(12345)
n = 10
k = 6
mu = np.abs(np.random.randn(n, 1))
Sigma = np.random.randn(n, n)
Sigma = Sigma.T.dot(Sigma)
w = cvx.Variable(n)

ret = mu.T*w
risk = cvx.quad_form(w, Sigma)
objective = cvx.Maximize(ret - risk)

binary = cvx.Variable(n,boolean=True)  # !!!
maxabsw=2
constraints = [ w>= -maxabsw,w<=maxabsw, cvx.abs(w)/maxabsw - binary <= 0., cvx.sum(binary) == k] # !!!

prob = cvx.Problem(objective, constraints)
prob.solve(verbose=True)

print(prob.status)

output = []
for i in range(len(w.value)):
    output.append(round(w[i].value,2))


print('Number of non-zero elements : ',sum(1 for i in output if i > 0))

Upvotes: 0

sascha
sascha

Reputation: 33532

A bit chaotic, this question.

So first: this kind of cardinality-constraint is NP-hard. This means, you can't express it using cvxpy without using Integer-programming (or else it would implicate P=NP)!

That beeing said, it would have been nicer, if there would be a pure version of the code without trying to formulate this constraint. I just assume it's the first code without the sum_smallest and sum_largest constraints.

So let's tackle the MIP-approach:

  • Your code trying to do this makes no sense at all
    • You introduce some binary-vars, but they have no connection to any other variable at all (so a constraint on it's sum is useless)!
    • You introduce some integer-vars, but they don't have any use at all!

So here is a MIP-approach:

import numpy as np
import cvxpy as cvx

np.random.seed(12345)
n = 10
k = 6
mu = np.abs(np.random.randn(n, 1))
Sigma = np.random.randn(n, n)
Sigma = Sigma.T.dot(Sigma)
w = cvx.Variable(n)

ret = mu.T*w
risk = cvx.quad_form(w, Sigma)
objective = cvx.Maximize(ret - risk)

binary = cvx.Bool(n)  # !!!

constraints = [cvx.sum_entries(w) == 1, w>= 0, w - binary <= 0., cvx.sum_entries(binary) == k] # !!!

prob = cvx.Problem(objective, constraints)
prob.solve(verbose=True)

print(prob.status)

output = []
for i in range(len(w.value)):
    output.append(round(w[i].value,2))


print('Number of non-zero elements : ',sum(1 for i in output if i > 0))

So we just added some binary-variables and connected them to w to indicate if w is nonzero or not.

If w is nonzero:

  • w will be > 0 because of constraint w>= 0
  • binary needs to be 1, or else constraint w - binary <= 0. is not fulfilled

So it's just introducing these binaries and this one indicator-constraint.

Now the cvx.sum_entries(binary) == k does what it should do.

Be careful with the implication-direction we used here. It might be relevant when chaging the constraint on k (like <=).

Keep in mind, that the default MIP-solver is awful. I also fear that Mosek's interface (sub-optimal within cvxpy) won't solve this, but i might be wrong.

Edit: Your in-range can easily be formulated using two more indicators for:

  • (k >= a) <= ind_0
  • (k <= b) <= ind_1

and adding a constraint which equals a logical_and:

  • ind_0 + ind_1 >= 2

Upvotes: 4

Related Questions