Theory of LP and the Simplex method¶
First, we will briefly discuss the primal-dual theory with a few examples. Consider the Factory problem that was discussed earlier:
!pip install gurobipy
from gurobipy import *
m = Model()
x1 = m.addVar(lb=0, vtype = GRB.CONTINUOUS, name='x1')
x2 = m.addVar(lb=0, vtype = GRB.CONTINUOUS, name='x2')
m.setObjective(40*x1+50*x2, GRB.MAXIMIZE)
m.addConstr(1*x1+2*x2<=40, name='c1')
m.addConstr(4*x1+3*x2<=120, name= 'c2')
m.optimize()
print('*'*100)
for var in m.getVars(): # descision variable
print(var.varName, '=', var.x, (var.obj,var.SAObjLow, var.SAObjUp, var.RC))
print('*'*100)
print('optimal total revenue:', m.objVal)
print('*'*100)
for con in m.getConstrs(): # constraints
print(con.ConstrName, ': slack =', con.slack,', shadow price=',
con.pi,',', (con.RHS, con.SARHSLow, con.SARHSUp))
print('*'*100)
print('*'*100)
Collecting gurobipy
Downloading gurobipy-9.5.0-cp37-cp37m-manylinux2014_x86_64.whl (11.5 MB)
|████████████████████████████████| 11.5 MB 4.8 MB/s
?25hInstalling collected packages: gurobipy
Successfully installed gurobipy-9.5.0
Restricted license - for non-production use only - expires 2023-10-25
Gurobi Optimizer version 9.5.0 build v9.5.0rc5 (linux64)
Thread count: 1 physical cores, 2 logical processors, using up to 2 threads
Optimize a model with 2 rows, 2 columns and 4 nonzeros
Model fingerprint: 0x3a526911
Coefficient statistics:
Matrix range [1e+00, 4e+00]
Objective range [4e+01, 5e+01]
Bounds range [0e+00, 0e+00]
RHS range [4e+01, 1e+02]
Presolve time: 0.01s
Presolved: 2 rows, 2 columns, 4 nonzeros
Iteration Objective Primal Inf. Dual Inf. Time
0 9.0000000e+31 3.250000e+30 9.000000e+01 0s
2 1.3600000e+03 0.000000e+00 0.000000e+00 0s
Solved in 2 iterations and 0.02 seconds (0.00 work units)
Optimal objective 1.360000000e+03
****************************************************************************************************
x1 = 24.0 (40.0, 25.0, 66.66666666666667, 0.0)
x2 = 8.0 (50.0, 30.0, 80.0, 0.0)
****************************************************************************************************
optimal total revenue: 1360.0
****************************************************************************************************
c1 : slack = 0.0 , shadow price= 16.0 , (40.0, 30.0, 80.0)
c2 : slack = 0.0 , shadow price= 6.0 , (120.0, 60.0, 160.0)
****************************************************************************************************
****************************************************************************************************
Here, the optimal solution is
In addition, the Python solution shows that
That is, the cost coefficients, \(\mathbf c = (c_1, c_2)\), have values 40 and 50, and the ranges in which they are allowed to change without affecting the optimal \(x^*\) are \((25, 66.667)\) and \((25, 66.667)\). Similarly, the RHS constraints, \(\mathbf b = (b_1, b_2)\), have values of 40 and 120 and can change to values \((30, 80)\) and \((60, 160)\) without affecting the optimal solution mix. Also, the shadow prices of the RHS constraints are 16 and 6, and there is no slack.
Is it possible to infer the optimal \(z\) or at least bound its value without solving the LP model?
The above objective is \(40𝑥_1 + 50𝑥_2\) can be bounded using the first constraint and the fact that both decision variables are non-negative
Can we do better?
Let’s use the second constraint,
but, here, the value 2000 is higher than the best upper bound we have so far, which is 1600.
Systematically, we can write that
and let \(h\) be the upper bound on the maximum of the objective. The trick is that we will use the constraint equations to infer \(d_1, d_2\) and \(h\). That is, we multiply the first constraint by \(v_1\ge0\), the second by \(v_2\ge0\), and then add the two:
or
In the above notation:
How do we choose the best coefficients \(v_1\), and \(v_2\)? We must ensure that \(d_1\ge 40\) and \(d_2\ge 50\), and we want \(h\) to be as small as possible under these constraints. This is again an LP model which is called the dual to the primal set
In general, the dual to primal LP is another LP model that is derived from it in the following way:
Each variable in the primal becomes a constraint in the dual
Each constraint in the primal becomes a variable in the dual
The objective direction is inversed: maximum in the primal becomes minimum in the dual, and vice versa.
Hence, for the max primal
the corresponding dual, is
The interpretation is that we solve for \(\mathbf v\), the shadow prices of the primal, by constraining the shadow prices with the cost coefficients, \(\mathbf c\).
Solving for \(v\) using Python, we find that the optimal is
m = Model()
x1 = m.addVar(lb=0, vtype = GRB.CONTINUOUS, name='v1')
x2 = m.addVar(lb=0, vtype = GRB.CONTINUOUS, name='v2')
m.setObjective(40*x1+120*x2, GRB.MINIMIZE)
m.addConstr(1*x1+4*x2>=40, name='c1')
m.addConstr(2*x1+3*x2>=50, name= 'c2')
m.optimize()
print('*'*100)
for var in m.getVars(): # descision variable
print(var.varName, '=', var.x, (var.obj,var.SAObjLow, var.SAObjUp, var.RC))
print('*'*100)
print('optimal total revenue:', m.objVal)
print('*'*100)
for con in m.getConstrs(): # constraints
print(con.ConstrName, ': slack =', con.slack,', shadow price=',
con.pi,',', (con.RHS, con.SARHSLow, con.SARHSUp))
Gurobi Optimizer version 9.5.0 build v9.5.0rc5 (linux64)
Thread count: 1 physical cores, 2 logical processors, using up to 2 threads
Optimize a model with 2 rows, 2 columns and 4 nonzeros
Model fingerprint: 0x8c4006b8
Coefficient statistics:
Matrix range [1e+00, 4e+00]
Objective range [4e+01, 1e+02]
Bounds range [0e+00, 0e+00]
RHS range [4e+01, 5e+01]
Presolve time: 0.01s
Presolved: 2 rows, 2 columns, 4 nonzeros
Iteration Objective Primal Inf. Dual Inf. Time
0 0.0000000e+00 4.500000e+01 0.000000e+00 0s
2 1.3600000e+03 0.000000e+00 0.000000e+00 0s
Solved in 2 iterations and 0.02 seconds (0.00 work units)
Optimal objective 1.360000000e+03
****************************************************************************************************
v1 = 16.0 (40.0, 30.0, 80.0, 0.0)
v2 = 6.0 (120.0, 60.0, 160.0, 0.0)
****************************************************************************************************
optimal total revenue: 1360.0
****************************************************************************************************
c1 : slack = 0.0 , shadow price= 24.0 , (40.0, 25.0, 66.66666666666666)
c2 : slack = 0.0 , shadow price= 8.0 , (50.0, 30.0, 80.0)
In addition, as shown above
The dual’s decision variables, \(\mathbf v\), are the primal’s shadow prices and the dual’s \(\mathbf b\) and \(\mathbf c\) correspond with their primal values. Lastly, the dual’s shadow prices are the primal’s decision variables.
The primal-dual correspondence gives us more flexibility in solving the LP model. In cases where the dual is simpler, we can solve it instead of the primal.
Few other properties emerge from the primal-dual relationship:
Weak duality¶
Consider the difference between the primal and dual objectives:
This equation can be expended by adding and subtracting \(\mathbf v\cdot\mathbf A\mathbf x\). That is
But, for our maximize objective problem
Hence,
That is, for maximize objective problem, the dual objective provides a natural upper bound assuming all points are feasible.
Complementary slackness¶
In case of zero slack, standardized system, or feasible binding set of points
Thus,
and the max optimal primal equals the min of optimal dual.
In our example,
and
KKT conditions for optimality¶
The Karush–Kuhn–Tucker (KKT) provides a necessary and sufficient condition for LP optimality. In short, for maximizing the objective, \(\mathbf c^T \mathbf x\), the following is required
Primal feasibility:
Dual feasibility:
Complementary slackness:
Improving search¶
If \(\mathbf x\) is feasible, the goal is to improve the solution from \(\mathbf x^{(t)}\) to \(\mathbf x^{(t+1)}\) via
where \(\lambda\) is the step size and \(\Delta \mathbf x\) is the direction.
Improve direction means that \(\mathbf x^{(t+1)}=\mathbf x^{(t)}+\lambda \Delta \mathbf x\) is better than \(\mathbf x^{(t)}\) for all \(\lambda>0\) sufficiently small.
\(\Delta \mathbf x\) is a feasible direction if \(\mathbf x^{(t)}+\lambda \Delta \mathbf x\) is feasible for all \(\lambda>0\) sufficiently small.
The objective \(z\) is \(\max z=\mathbf c^T\mathbf x\). So,
If \(\mathbf c=\Delta x\), then
which always improve (\(\Delta x\ne 0\)) for maximize objective function at any feasible point.
Feasible set of points is convex if any line segment between pair of feasible points fall entirely within the feasible region. I.e., line segment between \(x^{(1)}\) and \(x^{(2)}\) consists of all points along \(x^{(1)}+\lambda (x^{(2)}-x^{(1)})\) with \(0\le\lambda\le 1\). Hence, discrete feasible sets are NOT convex.
If the feasible set is convex, then there is always an improve direction. I.e.,
unless \(\mathbf x^{(t+1)}=\mathbf x^{(t)}=\mathbf x^*\). In that case, \(\mathbf x^*\) is the local max which is equal to the global max and the solution cannot improve.
If all constraints are linear, their feasible is convex:
Then,
and
which means that points along \(\mathbf x^{(1)}\) and \(\mathbf x^{(2)}\) are feasible and convex.
In LP over continuance variables w/ linear constraints and objective, the set of feasible points is convex, and the solution will stop improving at local \(=\) global maxima. Feasible sets of linear programs are called polyhedral and are convex.
Boundary point is defined s.t. at least one inequality becomes equality (or active) at that point. Else it is called an interior point.
Unless the objective is constant, every optimal point to an LP will occur at a boundary point of its feasible region.
Why?
If all inequalities are strict, we can take a small step in ALL directions from the interior point without losing feasibility. Then, \(\mathbf c=\Delta x\) will always improve the maximize problem. Hence, no interior point can be optimal.
Unique optimal must be an extreme point.
Why?
Consider optimal \(\mathbf x^*\) for the maximize problem \(\mathbf c^T\mathbf x\). If \(\mathbf x^*\) is NOT extreme of the feasible, then it must be the weighted average of two other feasible solutions \(\mathbf x^{(1)}\) and \(\mathbf x^{(2)}\). That is
and
If the objective of the two endpoints differs, their average \(\mathbf c\cdot\mathbf x^*\) must be lower than the higher, and thus \(\mathbf x^*\) is not optimal. If the two endpoints are equal, there are multiple optimal and \(\mathbf x^*\) is not unique. We conclude that the LP solution can be unique only if it is an extreme point of the feasible. If LP has any optimal solution, it follows that it has one at an extreme point of its feasible.
A few remarks:¶
Ratio constraints as \(x_1/x_2\le2/3\) can be “linearized” to \(3x_1-2x_2\le0.\)
Decision variables of relatively large magnitude are best modeled as continuance variables even though they correspond physically to integer quantities.
We can also linearize nonlinear constraints. In an example, minimax or min-deviation operators.
Minmax:
Min deviation:
The Simplex Algorithm¶
The algorithm is designed to improve the solution by moving from an extreme point to another adjacent while retaining feasibility.
Consider the following standard LP model of the Factory probelm with \(x_1\), \(x_2\) decision varibales and \(x_3\), \(x_4\) slack varibales:
Then, we insert the input parameters, \(\mathbf A\), \(\mathbf b\), and \(\mathbf c\), into a tabular format
Now, to begin, we choose an initial solution which is unique and feasible. For that we set \((x_1, x_2) = (0,0)\), which leave use with \((x_3, x_4) = (40,120)\). We call the active nonzero varibales basic feasible (B) and the others nonbasic feasible (N). Hence, the initial solution at \(t=0\) is \(\mathbf{x}^{(0)}=(0,0,40,120)\), and the current objective is \(\mathbf c^T \mathbf x^{(0)}=0\). Note that basic solutions exists only if the columns form a basis - the largest possible collection of linearily independent columns.
The goal is to improve the objective. For that we need to find the “best” improve direction \(\Delta \mathbf{x}\) and step size \(\lambda\) while maintaining feasability: \(\mathbf{x}^{(t+1)}=\mathbf{x}^{(t)}+\lambda \Delta \mathbf{x}\).
The constraints equations are \(\mathbf A\mathbf{x}^{(t)}=\mathbf{b}\). So, if the improved \(\mathbf{x}^{(t+1)}\) is feasible than \(\mathbf A\mathbf{x}^{(t+1)}=\mathbf{b}\), or \(\mathbf A(\mathbf{x}^{(t)}+\lambda \Delta \mathbf{x})=\mathbf{b}\). Subtracting the two
It follows that
In our example,
We want to switch one nonbasic variable with one basic to move to the adjacent edge without losing uniqueness and feasibility. For that, we set \(\Delta x_1,\Delta x_2=(1,0)\) and also \(\Delta x_1,\Delta x_2=(0,1)\) (Don’t think too much on the meaning of \(\Delta x=1\) as this value gets scaled out later when multiplying by \(\lambda\) that has \(\Delta x\) on its denominator). Solving for the former
and for the latter we find that
The corresponding change to the objective \(\mathbf c^T \Delta \mathbf{x}\) yield values of 40 and 50 which leads us to prefer the higher value of 50 that goes with \(\Delta \mathbf{x}^T=(1,0,-1,-4)\).
Now, we need to choose the improved step, \(\lambda\), s.t. feasibility will be maintained.
The improved solution must maintain the non-negativity constraint
Hence,
In addition, \(\lambda\) must be non-negative and sufficiently small not to take the improved solution outside of the feasible region. Hence,
Hence,
with the obective \(\mathbf c^T \mathbf x^{(1)}=1000\).
The process then continues with \(x_2\) replacing \(x_3\) as basic variable.
Now, we are looking for an improved solution. For that, we solve for the improved direction and step. We start by solving \(\Delta x\) for \(x_1\) and \(x_3\) to replace the current basic variables. This is done as before, resulting in the best improvement direction using \(\Delta\mathbf x\) for \(x_1\) with the largest change to the objective. The minimal step turns out to be \(\lambda=24\), and hence
with the obective \(\mathbf c^T \mathbf x^{(2)}=1360\).
Again, we are looking for an improved solution by solving for the improved direction and step. We solve \(\Delta x\) for \(x_3\) and \(x_4\) to replace the current basic variables \(x_1\) and \(x_2\). This is done as before, resulting in the best improvement direction using \(\Delta\mathbf x\) for \(x_4\) with the largest change to the objective. But, the minimal step turns out to be negative, and thus the search stops. The local optimum is global because it is an LP model over continuance variables and convex domain. \(\mathbf{x}^*=\mathbf{x}^{(2)}\) is optimal with max objective of \(1360\).