Minimum Cost Network Flow Problem (MCNFP) in R

network flows linear programming r

In this post we will walk through how to make least cost maximum flow decisions using linear programming.

Blake Conrad https://github.com/conradbm
2022-07-10

Traffic Minimum Cost Network Flow

One common extension to Maximum Flow problems is to do it as cheaply as possible. In this article we will extend the maximum flow example we wrote on in the last post and include a minimum cost component. The generic problem formation is below:

Objective \[ Minimize. \sum_{(i,j)\in A}{c_{ij}x_{ij}} \] \[ s.t. \] Node Flow Constraints \[ \sum_{j}{x_{ij}} - \sum_{i}{x_{ji}} = b_i \: \forall i \in N \] Arc Flow Constraints \[ l_{ij} \le x_{ij} \le u_{ij} \: \forall (i,j) \in A \]

The Problem

Road networks are everywhere in our society. In any given intersection there is a flow of cars, intersections with stop lights, and connections between each. Every road has a feasible limit it can support. In fact, this is often the cause of most congestion. Our goal is to minimize the total time required for all cars to travel from node 1 to node 6 in a fictitious road network.

The Data

nodes <- data.frame(id = c(1:6), color = c("green", rep("grey", 4), "red"))
edges <- data.frame(from = c(1, 1, 2, 2, 5, 4, 4, 3, 3), 
                    to = c(2, 3, 5, 4, 6, 5, 6, 5, 4),
                    lower_bound = 0,
                    upper_bound = c(800, 600, 100, 600, 600, 600, 400, 400, 300),
                    time = c(10, 50, 70, 30, 30, 30, 60, 60, 10), # in minutes
                    flow = 0, #TBD
                    color = "grey") %>% dplyr::arrange(from, to)
nodes
  id color
1  1 green
2  2  grey
3  3  grey
4  4  grey
5  5  grey
6  6   red
edges
  from to lower_bound upper_bound time flow color
1    1  2           0         800   10    0  grey
2    1  3           0         600   50    0  grey
3    2  4           0         600   30    0  grey
4    2  5           0         100   70    0  grey
5    3  4           0         300   10    0  grey
6    3  5           0         400   60    0  grey
7    4  5           0         600   30    0  grey
8    4  6           0         400   60    0  grey
9    5  6           0         600   30    0  grey

Network Visualization

We can see the upper bounds plotted on the edges of this transportation network below. The width indicates more capacity for flow. Examine the trade-off between time and space for travel between arc (1,2).

g <- igraph::graph_from_data_frame(d = edges,
                                   directed = TRUE,
                                   vertices = nodes)

# Capacity plot
plot(g, 
     main = "Vehicle Transportion Network Capacities", 
     edge.label = E(g)$upper_bound, 
     edge.width = E(g)$upper_bound/150,
     vertex.color = V(g)$color,
     edge.color = E(g)$color)
# Time plot
plot(g, 
     main = "Vehicle Transportion Network Travel Time", 
     edge.label = E(g)$time, 
     edge.width = E(g)$time/10,
     vertex.color = V(g)$color,
     edge.color = E(g)$color)

Model Data

Average In-Flow

First we assume the average number of cars that flow through this network. This is often recovered from past databases that record the flows through the network.

AVERAGE_FLOW <- 900 # per hour
AVERAGE_FLOW
[1] 900

Demand

Next we set up the demand that will be flowing through the network. This is indicated as the vector b in our model formation above. This means the initial node has a supply of 900 vehicles, while the final node has a demand of 900 nodes. The objective is to flow as many vehicles through the network, in the shortest amount of time.

demand <- c(AVERAGE_FLOW, rep(0, nrow(nodes)-2), -AVERAGE_FLOW)
demand
[1]  900    0    0    0    0 -900

Linear Programming (LP)

LP Structure

The next step is to set up the optimization. Let’s do that now. There are 3 key ingredients.

Arc Constraints

We want to build the constraint matrix, which has 2 parts.

The Arc Capacity Constraints are the first we address.

The Amat, or A matrix, is the arc matrix which contains the upper bounds. Since linear programming relies on the resource matrix, we need one row for each arc, our dimensions for variable flow selection are the number of arcs also. So this means we need an identity matrix for the rows of arcs and columns of arcs.

create_upper_arc_constraints <- function(edges){
  Amat <- matrix(0, nrow = nrow(edges), ncol = nrow(edges))
  Amat_dir <- rep("<=", nrow(Amat))
  Amat_rhs <- c()

  for(i in 1:ncol(Amat)){
    Amat[i,i] <- 1
    Amat_rhs <- c(Amat_rhs, edges$upper_bound[i])
  }
  
  list(Amat_upper = Amat,
       Amat_upper_dir = Amat_dir,
       Amat_upper_rhs = Amat_rhs)
}

# This could be higher than zero, but for standard LP this is the default configuration, so not needed.
# create_lower_arc_constraints <- function(edges){
#   Amat <- matrix(0, nrow = nrow(edges), ncol = nrow(edges))
#   Amat_dir <- rep(">=", nrow(Amat))
#   Amat_rhs <- c()
# 
#   for(i in 1:ncol(Amat)){
#     Amat[i,i] <- 1
#     Amat_rhs <- c(Amat_rhs, edges$lower_bound[i])
#   }
#   
#   list(Amat_lower = Amat,
#        Amat_lower_dir = Amat_dir,
#        Amat_lower_rhs = Amat_rhs)
# }

upper_results <- create_upper_arc_constraints(edges)
# lower_results <- create_lower_arc_constraints(edges)
# 
# Amat <- rbind(upper_results$Amat_upper, lower_results$Amat_lower)
# Amat_dir <- c(upper_results$Amat_upper_dir, lower_results$Amat_lower_dir)
# Amat_rhs <- c(upper_results$Amat_upper_rhs, lower_results$Amat_lower_rhs)

Amat <- upper_results$Amat_upper
Amat_dir <- upper_results$Amat_upper_dir
Amat_rhs <- upper_results$Amat_upper_rhs

Amat
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
 [1,]    1    0    0    0    0    0    0    0    0
 [2,]    0    1    0    0    0    0    0    0    0
 [3,]    0    0    1    0    0    0    0    0    0
 [4,]    0    0    0    1    0    0    0    0    0
 [5,]    0    0    0    0    1    0    0    0    0
 [6,]    0    0    0    0    0    1    0    0    0
 [7,]    0    0    0    0    0    0    1    0    0
 [8,]    0    0    0    0    0    0    0    1    0
 [9,]    0    0    0    0    0    0    0    0    1
Amat_dir
[1] "<=" "<=" "<=" "<=" "<=" "<=" "<=" "<=" "<="
Amat_rhs
[1] 800 600 600 100 300 400 600 400 600

The Node Flow Constraints are the next to take care of.

The Bmat, or B matrix, is a matrix which contains the node balance equations codified by flows.

For each node, we need to match it’s from node and to node with the appropriate inflow and outflow. If it matches an inflow arc, this is increase, so 1 is in the arc column. If it matches an outflow arch, this is decrease, so -1 is in the arc column. Otherwise 0 remains as the placeholder. The sign here is == because they must match (i.e., supply = demand). If we require excess at certain points we can set this demand to be higher than zero, but we will not do that here.

create_node_constraints <- function(nodes, edges){
    Bmat <- matrix(0, nrow = nrow(nodes), ncol = nrow(edges))
    Bmat_dir <- rep("==", nrow(Bmat))
    Bmat_rhs <- rep(0, nrow(Bmat))
    
    for(i in 1:nrow(Bmat)){
      node_id <- nodes[i, "id"]
      for(j in 1:ncol(Bmat)){
        edge_from <- edges[j,"from"]
        edge_to <- edges[j, "to"]
        
        if(node_id == edge_from){
          # print(paste("Outbound for ", node_id, "From: ",edge_from, "to: ", edge_to))
          Bmat[i,j] = 1
        }
        else if(node_id == edge_to){
          # print(paste("Inbound for ", node_id, "From: ",edge_from, "to: ", edge_to))
          Bmat[i,j] = -1
        }
        else{
          Bmat[i,j] = 0
        }
      }
      Bmat_rhs[i] <- demand[i]
    }
    
    list(Bmat = Bmat,
         Bmat_dir = Bmat_dir,
         Bmat_rhs = Bmat_rhs
    )
}

results <- create_node_constraints(nodes, edges)
Bmat <- results$Bmat
Bmat_dir <- results$Bmat_dir
Bmat_rhs <- results$Bmat_rhs

Bmat
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,]    1    1    0    0    0    0    0    0    0
[2,]   -1    0    1    1    0    0    0    0    0
[3,]    0   -1    0    0    1    1    0    0    0
[4,]    0    0   -1    0   -1    0    1    1    0
[5,]    0    0    0   -1    0   -1   -1    0    1
[6,]    0    0    0    0    0    0    0   -1   -1
Bmat_dir
[1] "==" "==" "==" "==" "==" "=="
Bmat_rhs
[1]  900    0    0    0    0 -900

Putting It All Together

Next, the objective is going to be the time. This is the cost we have to pay for assigning flow to an arc. Let’s take a look at everything all together.

f.obj <- edges %>% dplyr::pull(time)
f.cons <- rbind(Amat, Bmat)
f.rhs <- c(Amat_rhs, Bmat_rhs)
f.dir <- c(Amat_dir, Bmat_dir)

f.obj
[1] 10 50 30 70 10 60 30 60 30
f.cons
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
 [1,]    1    0    0    0    0    0    0    0    0
 [2,]    0    1    0    0    0    0    0    0    0
 [3,]    0    0    1    0    0    0    0    0    0
 [4,]    0    0    0    1    0    0    0    0    0
 [5,]    0    0    0    0    1    0    0    0    0
 [6,]    0    0    0    0    0    1    0    0    0
 [7,]    0    0    0    0    0    0    1    0    0
 [8,]    0    0    0    0    0    0    0    1    0
 [9,]    0    0    0    0    0    0    0    0    1
[10,]    1    1    0    0    0    0    0    0    0
[11,]   -1    0    1    1    0    0    0    0    0
[12,]    0   -1    0    0    1    1    0    0    0
[13,]    0    0   -1    0   -1    0    1    1    0
[14,]    0    0    0   -1    0   -1   -1    0    1
[15,]    0    0    0    0    0    0    0   -1   -1
f.rhs
 [1]  800  600  600  100  300  400  600  400  600  900    0    0    0
[14]    0 -900
f.dir
 [1] "<=" "<=" "<=" "<=" "<=" "<=" "<=" "<=" "<=" "==" "==" "==" "=="
[14] "==" "=="

Solve the LP

Now we unite everything together and drop it into the solver. Remember, we are trying to minimize cost in our objective, the constraints will maximize the flow. So specify min.

results <- lp(direction = "min",  
              objective.in = f.obj, 
              const.mat = f.cons, 
              const.dir = f.dir, 
              const.rhs = f.rhs)

edges$flow <- results$solution
edges
  from to lower_bound upper_bound time flow color
1    1  2           0         800   10  700  grey
2    1  3           0         600   50  200  grey
3    2  4           0         600   30  600  grey
4    2  5           0         100   70  100  grey
5    3  4           0         300   10  200  grey
6    3  5           0         400   60    0  grey
7    4  5           0         600   30  500  grey
8    4  6           0         400   60  300  grey
9    5  6           0         600   30  600  grey

Visualize Solution

Now that we have our flow we can do some visualizing and analysis. There are two key graphics to examine; the flow vs. capacity and the flow vs. time. First, the flow vs. capacity will give us insights into stress on the network. This is because of their implicit advantage they supply to the optimizer, maximum flows, so naturally, these get flooded with traffic. Second, the flow vs. time will give us insights into shortest distance paths (i.e., assuming time is proportional to distance, which is not always the case). This is because paths with shorter times will enable more to flow through in any given time delta. Between these two visuals, a good assessment of the model output is feasible.

# Set the flow
edges$flow <- results$solution

# If the arc is flowing oil, change to black
edges$color[which(edges$flow > 0)] <- "black"

# If the arc is at capacity change it to red
edges$color[which(edges$flow == edges$upper_bound)] <- "red"

g <- igraph::graph_from_data_frame(d = edges,
                                   directed = TRUE,
                                   vertices = nodes)

# Plot the flow and capacity
plot(g,
     edge.label = paste("(",E(g)$flow, 
                        ",",
                         E(g)$upper_bound, 
                         ")", 
                        sep=""), 
     main = "Traffic Flow | (flow, capacity)", 
     vertex.color = V(g)$color,
     vertex.size = 25,
     vertex.label.cex = 1.6,
     edge.color = E(g)$color,
     edge.width = E(g)$flow/200)
# Plot the time
plot(g,
     edge.label = paste("(",E(g)$flow, 
                        ",",
                         E(g)$time, 
                         ")", 
                        sep=""), 
     main = "Traffic Flow | (flow, time)", 
     vertex.color = V(g)$color,
     vertex.size = 25,
     vertex.label.cex = 1.6,
     edge.color = E(g)$color,
     edge.width = E(g)$flow/200)

Cleaning Up The Visual

nodes
  id color
1  1 green
2  2  grey
3  3  grey
4  4  grey
5  5  grey
6  6   red
visNet_nodes <- nodes %>% 
                dplyr::mutate(label = paste("Location ", id),
                              font.size =10,
                              type = "black",
                              shape = "circle",
                              font = 12)

visNet_edges <- edges %>% 
                dplyr::mutate(label = paste("(",flow,",",upper_bound,")",sep=""),
                              length = 250,
                              width = flow/100,
                              arrows = "to") 
  
visNetwork(visNet_nodes, 
           visNet_edges,
           width = "100%",
           main = "Least Time Maximum Vehicle Flow | (flow, capacity)")

Flow vs. Capacity vs. Time

Two of the most popular roadways are 2 -> 4 and 5 -> 6. These supply the least amount of time and the most amount of capacity. Without a doubt, the model has exploited these as much as possible. The only trick is, just how the flow gets there. We see up front that 1 ships off a 200 to 700 flow split from the 900 supply with the heavier allocation toward 2. Why?

  1. 2 is connected to a popular roadway, meaning much more potential to flow (and quickly).

  2. In order to get to 4, going through 3 will cost much more time (60 oppose to 40).

  3. 3 has two outlets, but one is one of the worst routes on the network due to it’s 60 minute trek, so it doesn’t even get any flow allocated.

Shortest Path

The shortest path shows one very interesting insight to this model; sending a maximum flow through the network is not all about time. The shortest path (least time) is the sequence 1 -> 2 -> 4 -> 6. However, from the above, we see that this isn’t the most stressed path. Why? We aren’t only interested in short times for the vehicles flowing through the network. We are also interested in getting them all through it! We assumed there was a 900 average vehicle flow, and having a macro-level view of this system, sending them all down the shortest path would not solve it (that is, we would not send as much as possible, only as cheaply as possible; we could have pushed more). In order to get the most cars sent through the network, in the shortest amount of time we also must take advantage of the popular roadways that the model is straining (or add incentive to the not so popular roadways with greater capacity or shorter commute times).

shortest.paths <- igraph::shortest_paths(graph = g, from = 1, to = 6)
s_path <- shortest.paths$vpath[[1]]
s_path
+ 4/6 vertices, named, from ac6f2e8:
[1] 1 2 4 6
shortest_commute_time <- E(g, path = s_path)$time %>% sum
shortest_commute_time
[1] 100

Shortest Path Visualization

E(g)$color <- "black"
E(g, path = s_path)$color <- "blue"

plot(g,
     edge.label = paste("(",E(g)$flow, 
                        ",",
                         E(g)$time, 
                         ")", 
                        sep=""), 
     main = "Shortest Path | (flow, capacity)", 
     vertex.color = V(g)$color,
     vertex.size = 25,
     vertex.label.cex = 1.6,
     edge.color = E(g)$color,
     edge.width = E(g)$flow/200)

Reusable Functionality

We can also build a function to reuse for next time.

#
# @lp.mincost.maxflow: data.frame, integer, integer, integer or list -> list
#   function outputs an edge list with flows.
#
# @edges: data.frame, should have from, to, and capacity columns for each edge in the network. from and to columns should contain unique node ids as integers from 0 to N.
# @source.id: integer, unique node id
# @dest.id: integer, unique node id
# @flow.demand: integer or list, if integer create a demand signal to push that much from source.id to dest.id, if a list expectation is that it sums to 0 and will contain each nodes supply (if positive) and demand (if negative).


lp.mincost.maxflow <- function(edges, source.id, dest.id, flow.demand){
  if(!("from" %in% names(edges)) || 
     !("to" %in% names(edges)) ||
     !("upper_bound" %in% names(edges)) ||
     !("lower_bound" %in% names(edges))){
    print("Need from, to, and capacity columns.")
  }
  else{
    
    # Infer the nodes
    nodes <- data.frame(id = sort(unique(c(unique(edges$from), unique(edges$to)))))
    
    # Infer the demand to flow through the network
    if(length(flow.demand)>1){
      if(sum(flow.demand) == 0){
        demand <- flow.demand
      }else{
        print("Flow demand doesn't add up to 0.")
      }
    }
    else{
      demand <- c(flow.demand, rep(0, nrow(nodes)-2), -flow.demand)
    }

    # Get arc capacity constraints
    create_arc_capacity_constraints <- function(edges){
        
      # For upper
      Amat <- matrix(0, nrow = nrow(edges), ncol = nrow(edges))
      Amat_dir <- rep("<=", nrow(Amat))
      Amat_rhs <- c()
    
      for(i in 1:ncol(Amat)){
        Amat[i,i] <- 1
        Amat_rhs <- c(Amat_rhs, edges$upper_bound[i])
      }
      
      # For lower
      Bmat <- matrix(0, nrow = nrow(edges), ncol = nrow(edges))
      Bmat_dir <- rep(">=", nrow(Bmat))
      Bmat_rhs <- c()
    
      for(i in 1:ncol(Bmat)){
        Bmat[i,i] <- 1
        Bmat_rhs <- c(Bmat_rhs, edges$lower_bound[i])
      }
      
      list(Amat = rbind(Amat, Bmat),
           Amat_dir = c(Amat_dir, Bmat_dir),
           Amat_rhs = c(Amat_rhs, Bmat_rhs))
    }
    
   results <- create_arc_capacity_constraints(edges)
    Amat <- results$Amat
    Amat_dir <- results$Amat_dir
    Amat_rhs <- results$Amat_rhs
    
    # Create node flow constraints (in = out)
    create_node_constraints <- function(nodes, edges){
        Bmat <- matrix(0, nrow = nrow(nodes), ncol = nrow(edges))
        Bmat_dir <- rep("==", nrow(Bmat))
        Bmat_rhs <- rep(0, nrow(Bmat))
        
        for(i in 1:nrow(Bmat)){
          node_id <- nodes[i, "id"]
          for(j in 1:ncol(Bmat)){
            edge_from <- edges[j,"from"]
            edge_to <- edges[j, "to"]
            
            if(node_id == edge_from){
              # print(paste("Outbound for ", node_id, "From: ",edge_from, "to: ", edge_to))
              Bmat[i,j] = 1
            }
            else if(node_id == edge_to){
              # print(paste("Inbound for ", node_id, "From: ",edge_from, "to: ", edge_to))
              Bmat[i,j] = -1
            }
            else{
              Bmat[i,j] = 0
            }
          }
          Bmat_rhs[i] <- demand[i]
        }
        
        list(Bmat = Bmat,
             Bmat_dir = Bmat_dir,
             Bmat_rhs = Bmat_rhs
        )
    }
    
    results <- create_node_constraints(nodes, edges)
    Bmat <- results$Bmat
    Bmat_dir <- results$Bmat_dir
    Bmat_rhs <- results$Bmat_rhs
    
    # Bring together
    f.obj <- edges$cost
    f.cons <- rbind(Amat, Bmat)
    f.rhs <- c(Amat_rhs, Bmat_rhs)
    f.dir <- c(Amat_dir, Bmat_dir)
    results <- lp(direction = "min",  
                  objective.in = f.obj, 
                  const.mat = f.cons, 
                  const.dir = f.dir, 
                  const.rhs = f.rhs)
    
    edges$flow <- results$solution
  }
  
  list(edges = edges, results = results)
}

edges <- data.frame(from = c(1, 1, 2, 2, 5, 4, 4, 3, 3), 
                    to = c(2, 3, 5, 4, 6, 5, 6, 5, 4),
                    lower_bound = 0,
                    upper_bound = c(800, 600, 100, 600, 600, 600, 400, 400, 300),
                    cost = c(10, 50, 70, 30, 30, 30, 60, 60, 10))

lp.mincost.maxflow(edges = edges, source.id = 1, dest.id = 6, flow.demand = 900)
$edges
  from to lower_bound upper_bound cost flow
1    1  2           0         800   10  700
2    1  3           0         600   50  200
3    2  5           0         100   70  100
4    2  4           0         600   30  600
5    5  6           0         600   30  600
6    4  5           0         600   30  500
7    4  6           0         400   60  300
8    3  5           0         400   60    0
9    3  4           0         300   10  200

$results
Success: the objective function is 95000 

Sourcing

You can also source the file to make things easier for next time. The code can be found here.

source("lp.mincost.maxflow.r")

edges <- data.frame(from = c(1, 1, 2, 2, 5, 4, 4, 3, 3), 
                    to = c(2, 3, 5, 4, 6, 5, 6, 5, 4),
                    lower_bound = 0,
                    upper_bound = c(800, 600, 100, 600, 600, 600, 400, 400, 300),
                    cost = c(10, 50, 70, 30, 30, 30, 60, 60, 10))

lp.mincost.maxflow(edges = edges, source.id = 1, dest.id = 6, flow.demand = 900)
$edges
  from to lower_bound upper_bound cost flow
1    1  2           0         800   10  700
2    1  3           0         600   50  200
3    2  5           0         100   70  100
4    2  4           0         600   30  600
5    5  6           0         600   30  600
6    4  5           0         600   30  500
7    4  6           0         400   60  300
8    3  5           0         400   60    0
9    3  4           0         300   10  200

$results
Success: the objective function is 95000