VeriFlow: Verifying Network-Wide Invariants in Real Time

USENIX Association 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’13) 15

Ahmed Khurshid, Xuan Zou, Wenxuan Zhou, Matthew Caesar, P. Brighten Godfrey

University of Illinois at Urbana-Champaign

{khurshi1, xuanzou2, wzhou10, caesar, pbg}@illinois.edu

Abstract

Networks are complex and prone to bugs. Existing tools

that check network conﬁguration ﬁles and the data-plane

state operate ofﬂine at timescales of seconds to hours,

and cannot detect or prevent bugs as they arise.

Is it possible to check network-wide invariants in real

time, as the network state evolves? The key challenge

here is to achieve extremely low latency during the

checks so that network performance is not affected. In

this paper, we present a design, VeriFlow, which achieves

this goal. VeriFlow is a layer between a software-

deﬁned networking controller and network devices that

checks for network-wide invariant violations dynami-

cally as each forwarding rule is inserted, modiﬁed or

deleted. VeriFlow supports analysis over multiple header

ﬁelds, and an API for checking custom invariants. Based

on a prototype implementation integrated with the NOX

OpenFlow controller, and driven by a Mininet OpenFlow

network and Route Views trace data, we ﬁnd that Veri-

Flow can perform rigorous checking within hundreds of

microseconds per rule insertion or deletion.

1 Introduction

Packet forwarding in modern networks is a complex pro-

cess, involving codependent functions running on hun-

dreds or thousands of devices, such as routers, switches,

and ﬁrewalls from different vendors. As a result, a sub-

stantial amount of effort is required to ensure networks’

correctness, security and fault tolerance. However, faults

in the network state arise commonly in practice, in-

cluding loops, suboptimal routing, black holes and ac-

cess control violations that make services unavailable or

prone to attacks (e.g., DDoS attacks). Software-Deﬁned

Networking (SDN) promises to ease the development of

network applications through logically-centralized net-

work programmability via an open interface to the data

plane, but bugs are likely to remain problematic since the

complexity of software will increase. Moreover, SDN al-

lows multiple applications or even multiple users to pro-

gram the same physical network simultaneously, poten-

tially resulting in conﬂicting rules that alter the intended

behavior of one or more applications [25].

One solution is to rigorously check network software

or conﬁguration for bugs prior to deployment. Symbolic

execution [12] can catch bugs through exploration of all

possible code paths, but is usually not tractable for large

software. Analysis of conﬁguration ﬁles [13, 28] is use-

ful, but cannot ﬁnd bugs in router software, and must be

designed for speciﬁc conﬁguration languages and control

protocols. Moreover, using these approaches, an opera-

tor who wants to ensure the network’s correctness must

have access to the software and conﬁguration, which may

be inconvenient in an SDN network where controllers

can be operated by other parties [25]. Another approach

is to statically analyze snapshots of the network-wide

data-plane state [9, 10, 17, 19, 27]. However, these pre-

vious approaches operate ofﬂine, and thus only ﬁnd bugs

after they happen.

This paper studies the question, Is it possible to check

network-wide correctness in real time as the network

evolves? If we can check each change to forwarding be-

havior before it takes effect, we can raise alarms imme-

diately, and even prevent bugs by blocking changes that

violate important invariants. For example, we could pro-

hibit changes that violate access control policies or cause

forwarding loops.

However, existing techniques for checking networks

are inadequate for this purpose as they operate on

timescales of seconds to hours [10, 17, 19].

Delay-

ing updates for processing can harm consistency of net-

work state, and increase reaction time of protocols with

real-time requirements such as routing and fast failover;

and processing a continuous stream of updates in a large

The average run time of reachability tests in [17] is 13 seconds,

and it takes a few hundred seconds to perform reachability checks in

Anteater [19].

16 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’13) USENIX Association

network could introduce scaling challenges. Hence, we

need some way to perform veriﬁcation at very high

speeds, i.e., within milliseconds. Moreover, checking

network-wide properties requires obtaining a view of

network-wide state.

We present a design, VeriFlow, which demonstrates

that the goal of real-time veriﬁcation of network-wide

invariants is achievable. VeriFlow leverages software-

deﬁned networking (SDN) to obtain a picture of the net-

work as it evolves by sitting as a layer between the SDN

controller and the forwarding devices, and checks va-

lidity of invariants as each rule is inserted, modiﬁed or

deleted. However, SDN alone does not make the prob-

lem easy. In order to ensure real-time response, Veri-

Flow introduces novel incremental algorithms to search

for potential violation of key network invariants — for

example, availability of a path to the destination, absence

of forwarding loops, enforcement of access control poli-

cies, or isolation between virtual networks.

Our prototype implementation supports both Open-

Flow [21] version 1.1.0 and IP forwarding rules, with the

exception that the current implementation does not sup-

port actions that modify packet headers. We microbench-

marked VeriFlow using a stream of updates from a simu-

lated IP network, constructed with Rocketfuel [7] topol-

ogy data and real BGP traces [8]. We also evaluated its

overhead relative to the NOX controller [14] in an emu-

lated OpenFlow network using Mininet [3]. We ﬁnd that

VeriFlow is able to verify network-wide invariants within

hundreds of microseconds as new rules are introduced

into the network. VeriFlow’s veriﬁcation phase has little

impact on network performance and inﬂates TCP con-

nection setup latency by a manageable amount, around

15.5% on average.

We give an overview of data plane veriﬁcation and

SDN (§2) before presenting VeriFlow’s design (§ 3), im-

plementation (§ 4), and evaluation (§ 5). We then discuss

future (§ 6) and related work (§ 7), and conclude (§ 8).

2 Overview of Approach

VeriFlow adopts the approach of data plane veriﬁca-

tion. As argued in [19], verifying network correctness

in the data plane offers several advantages over verifying

higher-level code such as conﬁguration ﬁles. First, it is

closely tied to the network’s actual behavior, so that it

can catch bugs that other tools miss. For example, con-

ﬁguration analysis [13, 28] cannot ﬁnd bugs that occur

in router software. Second, since data-plane state has

relatively simple formats and semantics that are com-

mon across many higher-layer protocols and implemen-

tations, it simpliﬁes rigorous analysis of a network.

Early data plane veriﬁcation algorithms were devel-

oped in [27], and systems include FlowChecker [9],

Anteater [19], and Header Space Analysis [17]. The lat-

ter two systems were applied to operational networks and

uncovered multiple real-world bugs, validating the data

plane analysis approach. However, as noted previously,

these are ofﬂine rather than real-time systems.

VeriFlow performs real-time data plane veriﬁcation in

the context of software deﬁned networks (SDNs). An

SDN comprises, at a high level, (1) a standardized and

open interface to read and write the data plane of net-

work devices such as switches and routers; (2) a con-

troller, a logically centralized device that can run custom

code and is responsible for transmitting commands (for-

warding rules) to network devices.

SDNs are a good match for data plane veriﬁcation.

First, a standardized data plane interface such as Open-

Flow [6] simpliﬁes uniﬁed analysis across all network

devices. Second, SDNs ease real-time data plane ver-

iﬁcation since the stream of updates to the network is

observable at the controller.

SDN thus simpliﬁes VeriFlow’s design. Moreover, we

believe SDNs can beneﬁt signiﬁcantly from VeriFlow’s

data plane veriﬁcation layer: the network operator can

verify that the network’s forwarding behavior is correct,

without needing to inspect (or trust) relatively complex

controller code, which may be developed by parties out-

side the network operator’s control.

3 Design of VeriFlow

Checking network-wide invariants in the presence of

complex forwarding elements can be a hard problem. For

example, packet ﬁlters alone make reachability checks

NP-Complete [19]. Aiming to perform these checks in

real-time is therefore challenging. Our design tackles

this problem as follows. First, we monitor all the network

update events in a live network as they are generated by

network control applications, the devices, or the network

operator. Second, we conﬁne our veriﬁcation activities

to only those parts of the network whose actions may be

inﬂuenced by a new update. Third, rather than check-

ing invariants with a general-purpose tool such as a SAT

or BDD solver as in [10, 19] (which are generally too

slow), we use a custom algorithm. We now discuss each

of these design decisions in detail.

VeriFlow’s ﬁrst job is to track every forwarding-state

change event. For example, in an SDN such as Open-

Flow [21], a centralized controller issues forwarding

rules to the network devices to handle ﬂows initiated by

users. VeriFlow must intercept all these rules and ver-

ify them before they reach the network. To achieve this

goal, VeriFlow is implemented as a shim layer between

the controller and the network, and monitors all commu-

nication in either direction.

USENIX Association 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’13) 17

For every rule insertion/deletion message, VeriFlow

must verify the effect of the rule on the network at very

high speed. VeriFlow cannot leverage techniques used by

past work [9,17, 19], because these operate at timescales

of seconds to hours. Unlike previous solutions, we do

not want to check the entire network on each change.

We solve this problem in three steps. First, using the

new rule and any overlapping existing rules, we slice the

network into a set of equivalence classes (ECs) of pack-

ets (§ 3.1). Each EC is a set of packets that experience

the same forwarding actions throughout the network. In-

tuitively, each change to the network will typically only

affect a very small number of ECs (see § 5.1). There-

fore, we ﬁnd the set of ECs whose operation could be al-

tered by a rule, and verify network invariants only within

those classes. Second, VeriFlow builds individual for-

warding graphs for every modiﬁed EC, representing the

network’s forwarding behavior (§ 3.2). Third, VeriFlow

traverses these graphs (or runs custom user-deﬁned code)

to determine the status of one or more invariants (§ 3.3).

The following subsections describe these steps in detail.

Figure 1 shows the placement and operations of VeriFlow

in an SDN.

Figure 1: VeriFlow sits between the SDN applications

and devices to intercept and check every rule entering

the network.

3.1 Slicing the network into equivalence

classes

One way to verify network properties is to prepare a

model of the entire network using its current data-plane

state, and run queries on this model [9, 19]. However,

checking the entire network’s state every time a new ﬂow

rule is inserted is wasteful, and fails to provide real-time

response. Instead, we note that most forwarding rule

changes affect only a small subset of all possible pack-

ets. For example, inserting a longest-preﬁx-match rule

for the destination IP ﬁeld will only affect forwarding

for packets destined to that preﬁx. In order to conﬁne

our veriﬁcation activities to only the affected set of pack-

ets, we slice the network into a set of equivalence classes

(ECs) based on the new rule and the existing rules that

overlap with the new rule. An equivalence class is de-

ﬁned as follows.

Deﬁnition (Equivalence Class): An equivalence class

(EC) is a set P of packets such that for any p

, p

∈P and

any network device R, the forwarding action is identical

for p

and p

at R.

Separating the entire packet space into individual ECs

allows VeriFlow to pinpoint the affected set of packets if

a problem is discovered while verifying a newly inserted

forwarding rule.

Let us look at an example. Consider an OpenFlow

switch with two rules matching packets with destination

IP address preﬁxes 11.1.0.0/16 and 12.1.0.0/16, respec-

tively. If a new rule matching destination IP address pre-

ﬁx 11.0.0.0/8 is added, it may affect packets belonging

to the 11.1.0.0/16 range depending on the rules’ priority

values [6] (the longer preﬁx may not have higher prior-

ity). However, the new rule will not affect packets out-

side the range 11.0.0.0/8, such as 12.1.0.0/16. Therefore,

VeriFlow will only consider the new rule (11.0.0.0/8) and

the existing overlapping rule (11.1.0.0/16) while analyz-

ing network properties. These two overlapping rules pro-

duce three ECs (represented using the lower and upper

bound range values of the destination IP address ﬁeld):

11.0.0.0 to 11.0.255.255, 11.1.0.0 to 11.1.255.255, and

11.2.255.255 to 11.255.255.255.

VeriFlow needs an efﬁcient data structure to quickly

store new network rules, ﬁnd overlapping rules, and

compute the affected ECs. For this we utilize a multi-

dimensional preﬁx tree (trie) inspired by traditional

packet classiﬁcation algorithms [26].

A trie is an ordered tree data structure that stores an

associative array. In our case, the trie associates the set

of packets matched by a forwarding rule with the for-

warding rule itself. Each level in the trie corresponds

to a speciﬁc bit in a forwarding rule (equivalently, a bit

in the packet header). Each node in our trie has three

branches, corresponding to three possible values that the

rule can match: 0, 1, and * (wildcard). The trie can be

seen as a composition of several sub-tries or dimensions,

each corresponding to a packet header ﬁeld. We main-

tain a sub-trie in our multi-dimensional trie for each of

the mandatory match and packet header ﬁelds supported

by OpenFlow 1.1.0.

(Note that an optimization in our

implementation uses a condensed set of ﬁelds in the trie;

(DL

SRC, DL DST, NW SRC, NW DST, IN PORT, DL VLAN,

VLAN PCP, DL TYPE, NW TOS, NW PROTO, TP SRC,

DST, MPLS LABEL and MPLS TC).

18 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’13) USENIX Association

see § 4.2.) For example, the sub-trie representing the

IPv4 destination corresponds to 32 levels in the trie. One

of the sub-tries (DL

SRC in our design) appears at the

top, the next ﬁeld’s sub-tries are attached to the leaves

of the ﬁrst, and so on (Figure 2). A path from the trie’s

root to a leaf of one of the bottommost sub-tries thus rep-

resents the set of packets that a rule matches. Each leaf

stores the rules that match that set of packets, and the

devices at which they are located (Figure 2).

Figure 2: VeriFlow’s core algorithmic process.

When a new forwarding rule is generated by the ap-

plication, we perform a lookup in our trie, by traversing

it dimension by dimension to ﬁnd all the rules that inter-

sect the new rule. At each dimension, we narrow down

the search area by only traversing those branches that fall

within the range of the new rule using the ﬁeld value

of that particular dimension. The lookup procedure re-

sults in the selection of a set of leaves of the bottommost

dimension, each with a set of forwarding rules. These

rules collectively deﬁne a set of packets (in particular,

their corresponding forwarding rules) that could be af-

fected by the incoming forwarding rule. This set may

span multiple ECs. We next compute the individual ECs

as illustrated in Figure 2. For each ﬁeld, we ﬁnd a set

of disjoint ranges (lower and upper bound) such that no

rule splits one of the ranges. An EC is then deﬁned by

a particular choice of one of the ranges for each of the

ﬁelds. This is not necessarily a minimal set of ECs; for

example, ECs 2 and 4 in Figure 2 could have been com-

bined into a single EC. However, this method performs

well in practice.

3.2 Modeling forwarding state with for-

warding graphs

For each EC computed in the previous step, VeriFlow

generates a forwarding graph. Each such graph is a

representation of how packets within an EC will be for-

warded through the network. In the graph, a node repre-

sents an EC at a particular network device, and a directed

edge represents a forwarding decision for a particular

(EC, device) pair. Speciﬁcally, an edge X →Y indicates

that according to the forwarding table at node X, packets

within this EC are forwarded to Y . To build the graph

for each EC, we traverse our trie a second time to ﬁnd

the devices and rules that match packets from that EC.

The second traversal is needed to ﬁnd all those rules that

were not necessary to compute the affected ECs in the

ﬁrst traversal, yet can still inﬂuence their forwarding be-

havior. For example, for a new rule with 10.0.0.0/8 spec-

iﬁed as the destination preﬁx, an existing 0.0.0.0/0 rule

will not contribute to the generation of the affected ECs,

but may inﬂuence their forwarding behavior depending

on its priority. Given the range values of different ﬁelds

of an EC, looking up matching rules from the trie struc-

ture can be performed very quickly. Here, VeriFlow only

has to traverse those branches of the trie having rules that

can match packets of that particular EC.

3.3 Running queries

Above, we described how VeriFlow models the behav-

ior of the network using forwarding graphs, building for-

warding graphs only for those equivalence classes (ECs)

whose behavior may have changed. Next, we answer

queries (check invariants) using this model.

VeriFlow maintains a list of invariants to be checked.

When ECs have been modiﬁed, VeriFlow checks each

(invariant, modiﬁed EC) pair. An invariant is speciﬁed

as a veriﬁcation function that takes as input the forward-

ing graph for a speciﬁc EC, performs arbitrary computa-

tion, and can trigger resulting actions. VeriFlow exposes

an API (Application Programming Interface), the imple-

mentation of which is described in § 4.3, so that new in-

variants can be written and plugged in.

Up to a certain level of detail, the forwarding graph is

an exact representation of the forwarding behavior of the

network. Therefore, invariant modules can check a large

diversity of conditions concerning network behavior. For

example:

• Basic reachability: The veriﬁcation function tra-

verses the directed edges in the forwarding graph

(using depth-ﬁrst search in our implementation) to

determine whether packets will be delivered to the

destination address speciﬁed in the rule.

• Loop-freeness: The veriﬁcation function traverses

the given EC’s forwarding graph to check that it

does not contain a loop.

• Consistency: Given two (pre-speciﬁed) routers

that are intended to have identical forward-

ing operations, the veriﬁcation function traverses

USENIX Association 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’13) 19

the forwarding graph starting at R

and R

to test

whether the fate of packets is the same in both cases.

(Any difference may indicate a bug.)

Further examples include detecting “black holes”

where packets are dropped, ensuring isolation of mul-

tiple VLANs, verifying access control policies, checking

whether a new rule conﬂicts with an existing rule, check-

ing whether an EC changes its next hop due to the in-

sertion/deletion of a rule, ensuring that packets always

traverse a ﬁrewall, and so on.

There are two key limitations on what invariants can

be feasibly implemented. First, VeriFlow’s forwarding

graph construct must include the necessary information.

Our current implementation of VeriFlow does not, for

example, incorporate information on buffer sizes that

would be necessary for certain performance invariants.

(There is not, however, any fundamental reason that Veri-

Flow could not be augmented with such metadata.) Sec-

ond, the invariant check must be implementable in the in-

cremental manner described above where only the mod-

iﬁed ECs are considered at each step.

If a veriﬁcation function ﬁnds a violated invariant, it

can choose to trigger further actions within VeriFlow.

Two obvious actions are dropping the rule that was being

inserted into the network, or installing the rule but gener-

ating an alarm for the operator. For example, the operator

could choose to drop rules that cause a security violation

(such as packets leaking onto a protected VLAN), but

only generate an alarm for a black hole. Since veriﬁ-

cation functions are arbitrary code, they may take other

actions as well, such as maintaining statistics (e.g., rate

of forwarding behavior change) or writing logs.

3.4 Dealing with high veriﬁcation time

VeriFlow achieves real-time response by conﬁning its

veriﬁcation activities within those parts of the network

that are affected when a new forwarding rule is installed.

In general, the effectiveness of this approach will be de-

termined by numerous factors, such as the complexity of

veriﬁcation functions, the size of the network, the num-

ber of rules in the network, the number of unique ECs

covered by a new rule, the number of header ﬁelds used

to match packets by a new rule, and so on.

However, perhaps the most important factor summa-

rizing veriﬁcation time is the number of ECs modiﬁed.

As our later experiments will show, VeriFlow’s veriﬁca-

tion time is roughly linear in this number. In other words,

VeriFlow has difﬁculty verifying invariants in real-time

when large swaths of the network’s forwarding behavior

are altered in one operation.

When such disruptive events occur, VeriFlow may

need to let new rules be installed in the network with-

out waiting for veriﬁcation, and run the veriﬁcation pro-

cess in parallel. We lose the ability to block problematic

rules before they enter the network, but we note several

mitigating facts. First, the most prominent example of

a disruptive event affecting many ECs is a link failure,

in which case VeriFlow anyway cannot block the modi-

ﬁcation from entering the network. Second, upon (even-

tually) detecting a problem, VeriFlow can still raise an

alarm and remove the problematic rule(s) from the net-

work. Third, the fact that the number of affected ECs is

large may itself be worthy of an immediate alarm even

before invariants are checked for each EC. Finally, our

experiments with realistic forwarding rule update traces

(§ 5) show that disruptive events (i.e., events affecting

large number of ECs) are rare: in the vast majority of

cases (around 99%), the number of affected ECs is small

(less than 10).

4 Implementation

We describe three key aspects of our implementation:

our shim layer to intercept network events (§ 4.1), an op-

timization to accelerate veriﬁcation (§ 4.2), and our API

for custom invariants (§ 4.3).

4.1 Making deployment transparent

In order to ease the deployment of VeriFlow in net-

works with OpenFlow-enabled devices, and to use Veri-

Flow with unmodiﬁed OpenFlow applications, we need

a mechanism to make VeriFlow transparent so that these

existing OpenFlow entities may remain unaware of the

presence of VeriFlow. We built two versions of VeriFlow.

One is a proxy process [25] that sits between the con-

troller and the network, and is therefore independent of

the particular controller. The second version is integrated

with the NOX OpenFlow controller [14] to improve per-

formance; our performance evaluation is of this version.

We expect one could similarly integrate VeriFlow with

other controllers, such as Floodlight [2], Beacon [1] and

Maestro [11], without signiﬁcant trouble.

We built our implementation within NOX version

0.9.1 (full beta single-thread version). We integrated

VeriFlow within NOX, enabling it to run as a transpar-

ent rule veriﬁer sitting between the OpenFlow applica-

tions implemented using NOX’s API, and the switches

and routers in the network. SDN applications running

on NOX use the NOX API to manipulate the forwarding

state of the network, resulting in OFPT

FLOW MOD

(ﬂow table modiﬁcation) and other OpenFlow messages

generated by NOX. We modiﬁed NOX to intercept these

messages, and redirect them to our VeriFlow module.

This ensures that all messages are intercepted by Veri-

Flow before they are dispatched to the network. Veri-

20 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’13) USENIX Association

Flow then processes and checks the forwarding rules

contained in these messages for correctness, and can

block problematic ﬂow rules.

To integrate the VeriFlow module, we extend two

parts of NOX. First, within the core of NOX, the

send

openﬂow command() interface is responsible for

adding (relaying) ﬂow rules from OpenFlow applica-

tions to the switches. At the lower layers of NOX,

handle

ﬂow removed() handles events that remove rules

from switches, due to rule timeouts or commands sent

by applications. Our implementation intercepts all mes-

sages sent to these two function calls, and redirects them

to VeriFlow. To reduce memory usage and improve run-

ning time, we pass these messages via shallow copy.

There are ﬁve types of ﬂow table modiﬁcation

messages that can be generated by OpenFlow ap-

plications: OFPFC

ADD, OFPFC MODIFY STRICT,

OFPFC

DELETE STRICT, OFPFC MODIFY and OF-

PFC

DELETE. These rules differ in terms of whether

they add, modify or delete a rule from the ﬂow table. The

strict versions match all the ﬁelds bit by bit, whereas the

non-strict versions allow wildcards. Our implementation

handles all these message types appropriately.

4.2 Optimizing the veriﬁcation process

We use an optimization technique that exploits the way

certain match and packet header ﬁelds are handled in the

OpenFlow 1.1.0 speciﬁcation. 10 out of 14 ﬁelds in this

speciﬁcation do not support arbitrary wildcards.

One

can only specify an exact value or the special ANY (wild-

card) value in these ﬁelds. We do not use separate di-

mensions in our trie to represent these ﬁelds, because

we do not need to ﬁnd multiple overlapping ranges for

them. Therefore, we only maintain the trie structure for

the other four ﬁelds (DL

SRC, DL DST, NW SRC and

DST). Due to this change, we generate the set of af-

fected equivalence classes (ECs) in three steps. First, we

use the trie structure to look for network-wide overlap-

ping rules, and ﬁnd the set of affected packets determined

by the four ﬁelds that are represented by the trie. Each in-

dividual packet set we get from this step is actually a set

of ECs that can be distinguished by the other 10 ﬁelds.

Second, for each of these packet sets, we extract all the

rules that can match packets of that particular class from

the location/device of the newly inserted rule. We lin-

early go through all these rules to ﬁnd non-overlapping

range values for the rest of the ﬁelds that are not main-

tained in the trie structure. Thus, each packet set found in

the ﬁrst step breaks into multiple ﬁner packet sets span-

ning all the 14 mandatory OpenFlow match and packet

header ﬁelds. Note that in this step we only consider

PORT, DL VLAN, DL VLAN PCP, DL TYPE, NW TOS,

PROTO, TP SRC, TP DST, MPLS LABEL and MPLS TC.

the rules present at the device of the newly inserted rule.

Therefore, in the ﬁnal step, as we traverse the forward-

ing graphs, we may encounter ﬁner rules at other devices

that will generate new packet sets with ﬁner granularity.

We handle them by maintaining sets of excluded packets

as described in the next paragraph.

Each forwarding graph that we generate using our trie

structure represents the forwarding state of a group of

packet sets that can be distinguished using the 10 ﬁelds

that do not support arbitrary wildcards. Therefore, while

traversing the forwarding graphs, we only work on those

rules that overlap with the newly inserted rule on these 10

ﬁelds. As we move from node to node while traversing

these graphs, we keep track of the ECs that have been

served by ﬁner rules and are no longer present in the

primary packet set that was generated in the ﬁrst place.

For example, in a device, a subset of a packet set may

be served by a ﬁner rule having higher priority than a

coarser rule that serves the rest of that packet set. We

handle this by maintaining a set of excluded packets for

each forwarding action. Therefore, whenever we reach a

node that answers a query (e.g., found a loop or reached

a destination), the primary packet set minus the set of ex-

cluded packets gives the set of packets that experiences

the result of the query.

4.3 API to write general queries

We expose a set of functions that can be used to write

general queries in C++. Below is a list of these functions

along with the required parameters.

GetAffectedEquivalenceClasses: Given a new rule, this

function computes the set of affected ECs, and returns

them. It also returns a set of sub-tries from the last di-

mension of our trie structure. Each sub-trie holds the

rules that can match packets belonging to one of the af-

fected ECs. This information can be used to build the

forwarding graphs of those ECs. This function takes the

following parameters.

- Rule: A newly inserted rule.

- Returns: Affected ECs.

- Returns: Sub-tries representing the last dimension, and

holding rules that can match packets of the affected ECs.

GetForwardingGraph: This function generates and re-

turns the forwarding graph for a particular EC. It takes

the following parameters.

- EquivalenceClass: An EC whose forwarding graph will

be computed.

- TrieSet: Sub-tries representing the last dimension, and

holding rules that match the EC supplied as the ﬁrst ar-

gument.

- Returns: Corresponding forwarding graph.

ProcessCurrentHop: This function allows the user to

USENIX Association 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’13) 21

traverse a forwarding graph in a custom manner. Given

a location and EC, it returns the corresponding next hop.

It handles the generation of multiple ﬁner packet sets by

computing excluded packet sets that need to be main-

tained because of our optimization strategy (§ 4.2). Due

to this optimization, this function returns a set of (next

hop, excluded packet set) tuples — effectively, an an-

notated directed edge in the forwarding graph. With re-

peated calls to this function across nodes in the forward-

ing graphs, custom invariant-checking modules can tra-

verse the forwarding graph and perform arbitrary compu-

tation on its structure. This function takes the following

parameters.

- ForwardingGraph: The forwarding graph of an EC.

- Location: The current location of the EC.

- Returns: (Next hop, excluded packet set) tuples.

Let us look at an example that shows how this API

can be used in practice. A network operator may want to

ensure that packets belonging to a certain set always pass

through a ﬁrewall device. This invariant can be violated

during addition/deletion of rules, or during link up/down

events. To check this invariant, the network operator can

extend VeriFlow using the above API to incorporate a

custom query algorithm that generates an alarm when the

packet set under scrutiny bypasses the ﬁrewall device.

In fact, the network operator can implement any query

that can be answered using the information present in the

forwarding graphs.

5 Evaluation

In this section, we present a performance evaluation of

our VeriFlow implementation. As VeriFlow intercepts

every rule insertion message whenever it is issued by an

SDN controller, it is crucial to complete the veriﬁcation

process in real time so that network performance is not

affected, and to ensure scalability of the controller. We

evaluated the overhead of VeriFlow’s operations with the

help of two experiments. In the ﬁrst experiment (§ 5.1),

our goal is to microbenchmark different phases of Veri-

Flow’s operations and observe their contribution to the

overall running time. The goal of the second experiment

(§ 5.2) is to assess the impact of VeriFlow on TCP con-

nection setup latency and throughput as perceived by end

users of an SDN.

In all of our experiments, we used our basic reachabil-

ity algorithms to test for loops and black holes for every

ﬂow modiﬁcation message that was sent to the network.

All of our experiments were performed on a Dell Opti-

plex 9010 machine with an Intel Core i7 3770 CPU with

4 physical cores and 8 threads at 3.4 GHz, and 32 GB of

RAM, running 64 bit Ubuntu Linux 11.10.

5.1 Per-update processing time

In this experiment, we simulated a network consisting

of 172 routers following a Rocketfuel [7] topology (AS

1755), and replayed BGP (Border Gateway Protocol)

RIB (Routing Information Base) and update traces col-

lected from the Route Views Project [8]. We built an

OSPF (Open Shortest Path First) simulator to compute

the IGP (Interior Gateway Protocol) path cost between

every pair of routers in the network. A BGP RIB snap-

shot consisting of 5 million entries was used to initialize

the routers’ FIB (Forwarding Information Base) tables.

Only the FIBs of the border routers were initialized in

this phase. We randomly mapped Route Views peers to

border routers in our network, and then replayed RIB and

update traces so that they originate according to this map-

ping. We replayed a BGP update trace containing 90,000

updates to trigger dynamic changes in the network. Upon

receiving an update from the neighboring AS, each bor-

der router sends the update to all the other routers in the

network. Using standard BGP polices, each router up-

dates its RIB using the information present in the update,

and updates its FIB based on BGP AS path length and

IGP path cost. We fed all the FIB changes into VeriFlow

to measure the time VeriFlow takes to complete its indi-

vidual steps described in § 3. We recorded the run time to

process each change individually. Note that in this ﬁrst

set of experiments, only the destination IP address is used

to forward packets. Therefore, only this one ﬁeld con-

tributes to the generation of equivalence classes (ECs).

We initialize the other ﬁelds with ANY (wildcards).

The results from this experiment are shown in Fig-

ure 3(a). VeriFlow is able to verify most of the updates

within 1 millisecond (ms), with mean veriﬁcation time

of 0.38ms. Moreover, of this time, the query phase takes

only 0.01ms on an average, demonstrating the value of

reducing the query problem to a simple graph traversal

for each EC. Therefore, VeriFlow would be able to run

multiple queries of interest to the network operator (e.g.,

black hole detection, isolation of multiple VLANs, etc.)

within a millisecond time budget.

We found that the number of ECs that are affected by a

new rule strongly inﬂuences veriﬁcation time. The scat-

ter plot of Figure 3(b) shows one data point for each ob-

served number of modiﬁed ECs (showing the mean ver-

iﬁcation time across all rules, which modiﬁed that num-

ber of ECs). The largest number of ECs affected by a

single rule was 574; the largest veriﬁcation latency was

159.2ms due to an update affecting 511 ECs. However,

in this experiment, we found that for most updates the

number of affected ECs is small. 94.5% of the updates

only affected a single EC, and 99.1% affected less than

10 ECs. Therefore, only a small fraction of rules (0.9%)

affected large numbers of ECs. This can be observed by

22 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’13) USENIX Association

0.2

0.4

0.6

0.8

0.001 0.01 0.1 1 10 100 1000

CDF

Time (ms)

Find eqv. cls.

Build graph

Query

Total

(a)

0.1

100

1000

1 10 100 1000

Average verification time (ms)

Equivalence class count

(b)

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

5 10 20

Average verification time (ms)

% RIB rules used as packet filters

3 fields

6 fields

(c)

0.2

0.4

0.6

0.8

0.001 0.01 0.1 1 10 100 1000

CDF

Time (ms)

10% additional rules

20% additional rules

(d)

Figure 3: Per-update processing times: (a) Microbenchmark results, using the Route Views trace. Total veriﬁcation

time of VeriFlow remained below 1ms for 97.8% of the updates. (b) Scatter plot showing the inﬂuence of number of

equivalence classes on veriﬁcation time. (c) Results from multi-ﬁeld packet ﬁlter experiment using the Route Views

trace. As more ﬁelds are used in forwarding rules, the running time of VeriFlow increases. The average veriﬁcation

latency is not signiﬁcantly inﬂuenced as we increase the number of ﬁlters present in the network. (d) Results from the

conﬂict detection test. VeriFlow is fast enough to compute all the conﬂicting rules within hundreds of microseconds

for 99% of the updates.

looking at the long tail of Figure 3(a).

In the above experiment, we assumed that the network

topology remains unchanged, i.e., there are no link or

node failures. In case of a link failure or node failure

(which can be thought of as failure of multiple links con-

nected to the failed node), the packets that were using

that link or node will experience changes in their for-

warding behavior. When this happens, VeriFlow’s job

is to verify the fate of those affected packets. In order

to evaluate VeriFlow’s performance in this scenario, we

used the above topology and traces to run a new experi-

ment. In this experiment, we fed both the BGP RIB trace

and update trace to the network. Then we removed each

of the packet-carrying links (381 in total) of the network

one by one (restoring a removed link before removing

the next), and computed the number of affected ECs and

the running time of VeriFlow to verify the behavior of

those classes. We found that most of the link removals

affected a large number of ECs. 254 out of 381 links af-

fected more that 1,000 ECs. The mean veriﬁcation time

to verify a link failure event was 1.15 seconds, with a

maximum of 4.05 seconds. We can deal with such cases

by processing the forwarding graphs of different ECs in

parallel on multi-core processors. This is possible be-

cause the forwarding graphs do not depend on each other,

or on any shared data structure. However, as link or node

failures cannot be avoided once they happen, this may

not be a serious issue for network operators.

In order to evaluate VeriFlow’s performance in the

presence of more ﬁelds, we changed the input data set to

add packet ﬁlters that will selectively drop packets after

matching them against multiple ﬁelds. We randomly se-

lected a subset of the existing RIB rules currently present

in the network, and inserted packet ﬁlter rules by spec-

ifying values in some of the other ﬁelds that were not

present in the original trace. We ran this experiment

with two sets of ﬁelds. In the ﬁrst set we used TP

SRC

and TP

DST in addition to NW DST (3 ﬁelds in total),

which was already present in the trace. For each ran-

domly selected RIB rule, we set random values to those

USENIX Association 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’13) 23

two ﬁelds (TP SRC and TP DST), and set its priority

higher than the original rule. The remaining 11 ﬁelds

are set to ANY. While replaying the updates, all the 14

ﬁelds except NW

DST are set to ANY.

In the second set we used NW

SRC, IN PORT,

VLAN, TP SRC and TP DST in addition to

DST (6 ﬁelds in total). For each randomly selected

RIB rule, we set random values to IN

PORT, DL VLAN,

SRC and TP DST, a random /16 value in NW SRC,

and set the priority higher than the original rule. The

remaining 8 ﬁelds are set to ANY. While replaying the

updates, all the 14 ﬁelds except NW

SRC and NW DST

are set to ANY. In the updates, the NW

SRC is set to a

random /12 value and the NW

DST is the original value

present in the trace. We ran this experiment multiple

times varying the percentage of RIB rules that are used

to generate random ﬁlter rules with higher priority.

Figure 3(c) shows the results of this experiment. Ver-

iﬁcation time is heavily affected by the number of ﬁelds

used to classify packets. This happens because as we use

more ﬁelds to classify packets at ﬁner granularities, more

unique ECs are generated, and hence more forwarding

graphs need to be veriﬁed. We also note from Figure 3(c)

that VeriFlow’s overall performance is not affected much

by the number of ﬁlters that we install into the network.

In all our experiments thus far, we kept a ﬁxed order of

packet header ﬁelds in our trie structure. We started with

SRC (DS), followed by DL DST (DD), NW SRC

(NS) and NW

DST (ND). In order to evaluate the per-

formance of VeriFlow with different ﬁeld orderings, we

re-ran the above packet ﬁlter experiment with reordered

ﬁelds. In all the runs we used random values for the

SRC ﬁeld and used the NW DST values present in

the Route Views traces. All the other ﬁelds were set to

ANY. We installed random packet ﬁlter rules for 10% of

the BGP RIB entries. As our dataset only speciﬁed val-

ues for the NW

SRC and NW DST ﬁelds, there were a

total of 12 different orderings of the aforementioned 4

ﬁelds. Table 1 shows the results from this experiment.

Table 1: Effect of different ﬁeld orderings on total run-

ning time of VeriFlow.

Order Time (ms) Order Time (ms)

DS-DD-NS-ND 1.001 DS-DD-ND-NS 0.090

DS-NS-DD-ND 1.057 DS-ND-DD-NS 0.096

NS-DS-DD-ND 1.144 ND-DS-DD-NS 0.101

NS-DS-ND-DD 1.213 ND-DS-NS-DD 0.103

NS-ND-DS-DD 1.254 ND-NS-DS-DD 0.15

DS-NS-ND-DD 1.116 DS-ND-NS-DD 0.098

From Table 1, we can see that changing the ﬁeld or-

der in the trie structure greatly inﬂuences the running

time of VeriFlow. Putting the NW

DST ﬁeld ahead of

SRC reduced the running time by an order of mag-

nitude (from around 1ms to around 0.1ms). This hap-

pens because a particular ﬁeld order may produce fewer

unique ECs compared to other ﬁeld orderings for the

same rule. However, it is difﬁcult to come up with a sin-

gle ﬁeld order that works best in all scenarios, because

it is highly dependent on the type of rules present in a

particular network. Changing the ﬁeld order in the trie

structure dynamically and efﬁciently as the network state

evolves would be an interesting area for future work.

Checking non-reachability invariants: Most of our

discussion thus far focused on checking invariants as-

sociated with the inter-reachability of network devices.

To evaluate the generality of our tool, we implemented

two more invariants using our API that were not directly

related to reachability: conﬂict detection (whether the

newly inserted rule violates isolation of ﬂow tables be-

tween network slices, accomplished by checking the out-

put of the EC search phase), and k-monitoring (ensuring

that all paths in the network traverse one of several de-

ployed monitoring points, done by augmenting the for-

warding graph traversal process). We found that the over-

head of these checks was minimal. For the conﬂict de-

tection query, we ran the above ﬁltering experiment using

the 6-ﬁeld set with 10% and 20% newly inserted random

rules. However, this time instead of checking the reacha-

bility of the affected ECs as each update is replayed, we

only computed the set of rules that overlap/conﬂict with

the newly inserted rule. The results from this experiment

are shown in Figure 3(d).

From this ﬁgure, we can see that conﬂicting rule

checking can be done quickly, taking only 0.305ms on

average. (The step in the CDF is due to the fact that

some withdrawal rules did not overlap with any existing

rule.)

For the k-monitoring query experiment, we used a

snapshot of the Stanford backbone network data-plane

state that was used in [17]. This network consists of 16

routers, where 14 of these are internal routers and the

other 2 are gateway routers used to access the outside

network. The snapshot contains 7,213 FIB table entries

in total. In this experiment, we used VeriFlow to test

whether all the ECs currently present in the network pass

through one of the two gateway routers of the network.

We observed that at each location the average latency

to perform this check for all the ECs is around 68.06ms

with a maximum of 75.39ms.

5.2 Effect on network performance

In order to evaluate the effect of VeriFlow’s operations

on user-perceived TCP connection setup latency and the

network throughput, we emulated an OpenFlow network

consisting of 172 switches following the aforementioned

Rocketfuel topology using Mininet [3]. Mininet creates

24 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’13) USENIX Association

100

150

200

250

300

350

0 50 100 150 200 250 300 350

Success rate (connections/sec)

TCP connection attempts (per sec)

Without VeriFlow

With VeriFlow

(a)

1200

1250

1300

1350

1400

1450

1500

1550

1600

1650

1700

1750

0 50 100 150 200 250 300 350

Flow_Mod throughput (msg/sec)

TCP connection attempts (per sec)

Without VeriFlow

With VeriFlow

(b)

0.05

0.1

0.15

0.2

0.25

0.3

0 2 4 6 8 10 12 14

Average verification time (ms)

Field count

(c)

Figure 4: Effect on network performance: (a) TCP connection setup throughput, and (b) Throughput of ﬂow modiﬁ-

cation (Flow

Mod) messages, with and without VeriFlow. For different loads, VeriFlow imposes minimal overhead.

overhead of VeriFlow increases gradually.

a software-deﬁned network (SDN) with multiple nodes

on a single machine. We connected one host to ev-

ery switch in this emulated network. We ran the NOX

OpenFlow controller along with an application that pro-

vides the functionality of a learning switch. It allows

a host to reach any other host in the network by in-

stalling ﬂow rules in the switches using ﬂow modiﬁcation

(Flow

Mod) messages. We implemented a simple TCP

server program and a simple TCP client program to drive

the experiment. The server program accepts TCP con-

nections from clients and closes the connection immedi-

ately. The client program consists of two threads. The

primary thread continuously sends connect requests to a

random server using a non-blocking socket. To vary the

intensity of the workload, our TCP client program gener-

ates connections periodically with a parameterized sleep

interval (S). The primary thread at each client sleeps for

a random interval between 0 to S seconds (at microsec-

ond granularity) before initiating the connection request,

and iterating. The secondary thread at each client uses

the select function to look for connections that are ready

for transmission or experienced an error. A user supplied

polling interval (P) is used to control the rate at which the

select call will return. We set P inversely proportional to

the S value to avoid busy waiting and to allow the other

processes (e.g., Open vSwitch [5]) to get a good share of

the CPU. We ran the server program at each of the 172

hosts, and conﬁgured the client programs at all the hosts

to continually connect to the server of random hosts (ex-

cluding itself) over a particular duration (at least 10 min-

utes). In the switch application, we set the rule eviction

idle timeout to 1 second and hard timeout to 5 seconds.

We ran this experiment ﬁrst with NOX alone, and then

with NOX and VeriFlow. We used the same seed in all

the random number generators to ensure similar loads in

both the runs. We also varied the S value to monitor the

performance of VeriFlow under a range of network loads.

Figure 4(a) shows the number of TCP connections

that were successfully completed per second for differ-

ent workloads both with and without VeriFlow. From

this ﬁgure, we can see that in all the cases VeriFlow im-

poses negligible overhead on the TCP connection setup

throughput in our emulated OpenFlow network. The

largest reduction in throughput that we observed in our

experiments was only 0.74%.

Figure 4(b) shows the number of ﬂow modiﬁcation

(Flow

Mod) messages that were processed and sent to

the network per second for different workloads both with

and without VeriFlow. From this ﬁgure, again we can

see that in all the cases VeriFlow imposes minimal over-

head on the ﬂow modiﬁcation message throughput. The

largest reduction in throughput that we observed in our

experiments was only 12.8%. This reduction in through-

put is caused by the additional processing time required

to verify the ﬂow modiﬁcation messages before they are

sent to the network.

In order to assess the impact of VeriFlow on end-to-

end TCP connection setup latency, we ran this exper-

iment with S set to 30 seconds. We found that in the

presence of VeriFlow, the average TCP connection setup

latency increases by 15.5% (45.58ms without VeriFlow

versus 52.63ms with VeriFlow). As setting up a TCP

connection between two hosts in our emulated 172 host

OpenFlow network requires installing ﬂow rules into

more than one switch, the veriﬁcation performed by Veri-

Flow after receiving each ﬂow rule from the controller

inﬂates the end-to-end connection setup latency to some

extent.

Lastly, we ran this experiment after modifying Veri-

Flow to work with different numbers of OpenFlow

packet header ﬁelds. Clearly, if we restrict the number of

ﬁelds during the veriﬁcation process, there will be less

work for VeriFlow, resulting in faster veriﬁcation time.

In this experiment, we gradually increased the number

of OpenFlow packet header ﬁelds that were used during

the veriﬁcation process (from 1 to 14). VeriFlow sim-

ply ignored the excluded ﬁelds, and it reduced the num-

ber of dimensions in our trie structure. We set S to 10

USENIX Association 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’13) 25

seconds and ran each run for 10 minutes. During the

runs, we measured the veriﬁcation latency experienced

by each ﬂow modiﬁcation message generated by NOX,

and computed their average at each run.

Figure 4(c) shows the results from this experiment.

Here, we see that with the increase in the number of

packet header ﬁelds, the veriﬁcation overhead of Veri-

Flow increases gradually but always remains low enough

to ensure real-time response. The 5 ﬁelds that con-

tributed most in the veriﬁcation overhead are DL

SRC,

DST, NW SRC, NW DST and DL TYPE. This hap-

pened because these 5 ﬁelds had different values at dif-

ferent ﬂow rules, and contributed most in the generation

of multiple ECs. The other ﬁelds were mostly wildcards,

and did not generate additional ECs.

Comparison with related work: Finally, we com-

pared performance of our technique with two pieces of

related work: the Hassel tool presented in [17] (provided

to us by the authors), and a BDD-based analysis tool

that we implemented from scratch following the strat-

egy presented in [10] (the original code was not avail-

able to us). The authors of [17] provided two copies of

their tool, one in Python and one in C, and we evalu-

ated using the better-performing C version. While we

note these works solve different problems from our work

(e.g., HSA performs static veriﬁcation, and does it be-

tween port pairs), we present these results to put Veri-

Flow’s performance in context. First, we ran Hassel over

the snapshot of the Stanford backbone network data-

plane state that was used in [17]. We found that Has-

sel’s average time to check reachability between a pair

of ports (effectively exploring all ECs for that source-

destination pair) was 578.62ms, with a maximum of 6.24

seconds. In comparison, VeriFlow took only 68.06ms on

average (with a maximum of 75.39ms) to test the reach-

ability of all the ECs currently present at a single node

in the network. Next, in the BDD-based approach, we

used the NuSMV [4] model checker to build a BDD us-

ing a new rule and the overlapping existing rules, and

used CTL (Computation Tree Logic) to run reachability

queries [10]. Here, we used the Rocketfuel topology and

Route Views traces that we used in our earlier experi-

ments. We found that this approach is quite slow and

does not provide real-time response while inserting and

checking new forwarding rules. Checking an update took

335.71ms on an average with a maximum of 67.16 sec-

onds.

6 Discussion and Future Work

Deciding when to check: VeriFlow may not know

when an invariant violation is a true problem rather than

an intermediate state during which the violation is con-

sidered acceptable by the operator. For example, in an

SDN, applications can install rules into a set of switches

to build an end-to-end path from a source host to a des-

tination host. However, as VeriFlow is unaware of appli-

cation semantics, it may not be able to determine these

rule set boundaries. This may cause VeriFlow to report

the presence of temporary black holes while process-

ing a set of rules one by one. One possible solution is

for the SDN application to tell VeriFlow when to check.

Moreover, VeriFlow may be used with consistent update

mechanisms [20,24], where there are well-deﬁned stages

during which the network state is consistent and can be

checked.

Handling packet transformations: We can extend our

design to handle rules that perform packet transforma-

tion such as Network Address Translation. A transfor-

mation rule has two parts – the match part determines

the set of packets that will undergo the transformation,

and the transformation part represents the set of pack-

ets into which the matched packets will get transformed.

We can handle this case by generating additional equiva-

lence classes and their corresponding forwarding graphs,

to address the changes in packet header due to the trans-

formations. In the worst case, if we have transformations

at every hop (e.g., in an MPLS network), then we may

need to traverse our trie structure multiple times to build

an end-to-end path of a particular packet set. We leave a

full design and implementation to future work.

Multiple controllers: VeriFlow assumes it has a com-

plete view of the network to be checked. In a multi-

controller scenario, obtaining this view in real time

would be difﬁcult. Checking network-wide invariants in

real time with multiple controllers is a challenging prob-

lem for the future.

7 Related Work

Recent work on debugging general networks and SDNs

focuses on detecting network anomalies [10, 19], check-

ing OpenFlow applications [12], ensuring data-plane

consistency [20, 24], and allowing multiple applications

to run side-by-side in a non-conﬂicting manner [22, 23,

25]. However, unlike VeriFlow, none of the existing so-

lutions provides real-time veriﬁcation of network-wide

invariants as the network experiences dynamic changes.

Checking OpenFlow applications: Several tools have

been proposed to ﬁnd bugs in OpenFlow applications and

to allow multiple applications run on the same physical

network in a non-conﬂicting manner. NICE [12] per-

forms symbolic execution of OpenFlow applications and

applies model checking to explore the state space of an

entire OpenFlow network. Unlike VeriFlow, NICE is a

proactive approach that tries to ﬁgure out invalid system

26 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’13) USENIX Association

states by using a simpliﬁed OpenFlow switch model. It

is not designed to check network properties in real time.

FlowVisor [25] allows multiple OpenFlow applications

to run side-by-side on the same physical infrastructure

without affecting each others’ actions or performance.

Unlike VeriFlow, FlowVisor does not verify the rules that

applications send to the switches, and does not look for

violations of key network invariants.

In [22], the authors presented two algorithms to de-

tect conﬂicting rules in a virtualized OpenFlow network.

In another work [23], Porras et al. extended the NOX

OpenFlow controller with a live rule conﬂict detection

engine called FortNOX. Unlike VeriFlow, both of these

works only detect conﬂicting rules, and do not verify the

forwarding behavior of the affected packets. Therefore,

VeriFlow is capable of providing more useful informa-

tion compared to these previous works.

Ensuring data-plane consistency: Static analysis tech-

niques using data-plane information suffer from the chal-

lenge of working on a consistent view of the network’s

forwarding state. Although this issue is less severe in

SDNs due to their centralized controlling mechanism, in-

consistencies in data-plane information may cause tran-

sient faults in the network that go undetected during the

analysis phase. Reitblatt et al. [24] proposed a technique

that uses an idea similar to the one proposed in [15].

By tagging each rule by a version number, this tech-

nique ensures that switches forward packets using a con-

sistent view of the network. This same problem has

been addressed in [20] using a different approach. While

these works aim to tackle transient inconsistencies in an

SDN, VeriFlow tries to detect both transient and long-

term anomalies as the network state evolves. Therefore,

using these above mechanisms along with VeriFlow will

ensure that whenever VeriFlow allows a set of rules to

reach the switches, they will forward packets without any

transient and long-term anomalies.

Checking network invariants: The router conﬁgura-

tion checker (rcc) [13] checks conﬁguration ﬁles to de-

tect faults that may cause undesired behavior in the net-

work. However, rcc cannot detect faults that only man-

ifest themselves in the data plane (e.g., bugs in router

software and inconsistencies between the control plane

and the data plane; see [19] for examples).

Anteater [19] uses data-plane information of a net-

work, and checks for violations of key network invari-

ants (absence of routing loops and black holes). Anteater

converts the data-plane information into boolean ex-

pressions, translates network invariants into instances of

boolean satisﬁability (SAT) problems, and checks the

resultant SAT formulas using a SAT solver. Although

Anteater can detect violations of network invariants, it

is static in nature, and does not scale well to dynamic

changes in the network (taking up to hundreds of seconds

to check a single invariant). Header Space Analysis [17]

is a system with goals similar to Anteater, and is also not

real time.

Concurrent with our work, NetPlumber [16] is a tool

based on Header Space Analysis (HSA) that is capable

of checking network policies in real time. NetPlumber

uses HSA in an incremental manner to ensure real-time

response. Unlike VeriFlow, which allows users to write

their own custom query procedures, NetPlumber pro-

vides a policy language for network operators to specify

network policies that need to be checked.

ConﬁgChecker [10] and FlowChecker [9] convert net-

work rules (conﬁguration and forwarding rules respec-

tively) into boolean expressions in order to check net-

work invariants. They use Binary Decision Diagram

(BDD) to model the network state, and run queries us-

ing Computation Tree Logic (CTL). VeriFlow uses graph

search techniques to verify network-wide invariants, and

handles dynamic changes in real time. Moreover, unlike

previous solutions, VeriFlow can prevent problems from

hitting the forwarding plane, whereas FlowChecker ﬁnd

problems after they occur and (potentially) cause dam-

age. ConﬁgChecker, like rcc, cannot detect problems that

only affect the data plane.

An early version of VeriFlow was presented in [18] but

only supported checking for a single header ﬁeld with

a basic reachability invariant, had comparatively high

overhead, and had a limited evaluation.

8 Conclusion

In this paper, we presented VeriFlow, a network debug-

ging tool to ﬁnd faulty rules issued by SDN applica-

tions, and optionally prevent them from reaching the net-

work and causing anomalous network behavior. Veri-

Flow leverages a set of efﬁcient algorithms to check rule

modiﬁcation events in real time before they are sent to

the live network. To the best of our knowledge, VeriFlow

is the ﬁrst tool that can verify network-wide invariants in

a live network in real time. With the help of experiments

using a real world network topology, real world traces,

and an emulated OpenFlow network, we found that Veri-

Flow is capable of processing forwarding table updates

in real time.

Acknowledgments

We thank our shepherd, Richard Mortier, and the anony-

mous reviewers for their valuable comments. We grate-

fully acknowledge the support of the NSA Illinois Sci-

ence of Security Lablet, and National Science Founda-

tion grants CNS 1040396 and CNS 1053781.

USENIX Association 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’13) 27

References

[1] Beacon OpenFlow Controller. https:

//openflow.stanford.edu/display/

Beacon/Home.

[2] Floodlight Open SDN Controller. http://

floodlight.openflowhub.org.

[3] Mininet: Rapid prototyping for software de-

ﬁned networks. http://yuba.stanford.edu/

foswiki/bin/view/OpenFlow/Mininet.

[4] NuSMV: A new symbolic model checker. http:

//nusmv.fbk.eu.

[5] Open vSwitch. http://openvswitch.org.

[6] OpenFlow switch speciﬁcation. http:

//www.openflow.org/documents/

openflow-spec-v1.1.0.pdf.

[7] Rocketfuel: An ISP topology mapping engine.

http://www.cs.washington.edu/research/

networking/rocketfuel.

[8] University of Oregon Route Views Project. http:

//www.routeviews.org.

[9] A

L-SHAER, E., AND AL-HAJ, S. FlowChecker:

Conﬁguration analysis and veriﬁcation of federated

OpenFlow infrastructures. In SafeConﬁg (2010).

[10] A

L-SHAER, E., MARRERO, W., EL-ATAW Y, A.,

AND ELBADAWI, K. Network conﬁguration in a

box: Towards end-to-end veriﬁcation of network

reachability and security. In ICNP (2009).

[11] C

AI, Z., COX, A. L., AND NG, T. S. E.

Maestro: A system for scalable openﬂow con-

trol. http://www.cs.rice.edu/

eugeneng/

papers/TR10-11.pdf.

[12] C

ANINI, M., VENZANO, D., PERESINI, P., KOS-

TIC, D., AND REXFORD, J. A NICE way to test

OpenFlow applications. In NSDI (2012).

[13] F

EAMSTER, N., AND BALAKRISHNAN, H. De-

tecting BGP conﬁguration faults with static analy-

sis. In NSDI (2005).

[14] G

UDE, N., KOPONEN, T., PETTIT, J., PFAFF, B.,

ASADO, M., MCKEOWN, N., AND SHENKER,

S. NOX: Towards an operating system for net-

works. In SIGCOMM CCR (2008).

[15] J

OHN, J. P., KAT Z -BASSETT, E., KRISHNA-

MURTHY, A., ANDERSON, T., AND VENKATARA-

MANI, A. Consensus routing: The Internet as a

distributed system. In NSDI (2008).

[16] K

AZEMIAN, P., C HANG, M., ZENG, H., VARGH-

ESE, G., MCKEOWN, N., AND WHYTE, S. Real

time network policy checking using header space

analysis. In NSDI (2013).

[17] K

AZEMIAN, P. , VARGHESE, G., AND MCKE-

OWN, N. Header space analysis: Static checking

for networks. In NSDI (2012).

[18] K

HURSHID, A., ZHOU, W., CAESAR, M., AND

GODFREY, P. B. VeriFlow: Verifying network-

wide invariants in real time. In HotSDN (2012).

[19] M

AI, H., KHURSHID, A., AGARWAL, R., CAE-

SAR, M., GODFREY, P. B ., AND KING, S. T.

Debugging the data plane with Anteater. In SIG-

COMM (2011).

[20] M

CGEER, R. A safe, efﬁcient update protocol for

OpenFlow networks. In HotSDN (2012).

[21] M

CKEOWN, N., ANDERSON, T., BALAKRISH-

NAN, H., PARULKAR, G., PETERSON, L., REX-

FORD, J., AND SHENKER, S. OpenFlow: En-

abling innovation in campus networks. In SIG-

COMM CCR (2008).

[22] N

ATARAJAN, S., HUANG, X., AND WOLF, T. Ef-

ﬁcient conﬂict detection in ﬂow-based virtualized

networks. In ICNC (2012).

[23] P

ORRAS, P. , S HIN, S., YEGNESWARAN, V. ,

ONG, M., TYSON, M., AND GU, G. A secu-

rity enforcement kernel for OpenFlow networks. In

HotSDN (2012).

[24] R

EITBLATT, M., FOSTER, N., REXFORD, J.,

CHLESINGER, C., AND WALKER, D. Abstrac-

tions for network update. In SIGCOMM (2012).

[25] S

HERWOOD, R., GIBB,G.,YAP, K.-K., APPEN-

ZELLER, G., CASADO, M., MCKEOWN, N., AND

PARULKAR, G. Can the production network be the

testbed? In OSDI (2010).

[26] V

ARGHESE, G. Network Algorithmics: An inter-

disciplinary approach to designing fast networked

devices, 2004.

[27] X

IE, G., ZHAN, J., MALTZ, D., ZHANG, H.,

REENBERG, A., HJALMTYSSON, G., AND REX-

FORD, J. On static reachability analysis of IP net-

works. In INFOCOM (2005).

[28] Y

UAN, L., MAI, J., SU, Z., CHEN, H., CHUAH,

C.-N.,

AND MOHAPATRA, P. FIREMAN: A

toolkit for ﬁrewall modeling and analysis. In SnP

(2006).