Service

Blog
6 min read

OT Network Design Best Practices: What to Fix First (and Why)

A practical approach to improving OT network architecture without disrupting production.

March 31, 2026

OT Network Design Best Practices: What to Fix First (and Why)

In many industrial environments, the network you already have is the network you have to work with

Equipment is already installed. Production can’t stop for extended redesigns. Vendor access, legacy systems, and existing infrastructure all have to be considered before any changes are made. That’s what makes improving an OT network so challenging.

The question isn’t what a “perfect” network looks like. It’s where to start, and what to change first, so the network becomes more stable and easier to support without creating new problems along the way.

This guide is meant to help you approach OT network design (or, more likely, redesign) in a way that makes sense in a live operating environment

Before You Begin: Know Your Existing Network

You can’t make changes to a network without a clear picture of what’s already in place. But getting to know your network is not always as simple as it sounds.

We often run into networks that have grown over time. Equipment gets added, connections get made to solve immediate needs, and documentation doesn’t always keep up. When what’s on paper and what’s actually happening aren’t exactly the same, the first step is always alignment

What’s connected? Where does everything live? How are your systems tied together? Which devices are talking the most? When does traffic peak?

Answering those questions (among others) can help you understand what “normal” looks like before anything changes. Without that baseline, it’s hard to tell if the changes you make actually improve your network.

Step-by-Step OT Network Redesign: What to Fix First

When you start making changes to an OT network, the biggest mistake is trying to do everything at once.

You might know you need better segmentation, cleaner remote access, more redundancy, better visibility—all of that’s true. But if you don’t go in the right order, you end up creating more complexity without actually improving how the network behaves.

So the focus is less on what to do and more on when to do it.

Phase 1: Document + Baseline Traffic

Start by getting a clear view of what’s there.

That means mapping the network, understanding what devices are connected, and getting a sense of how traffic actually moves through the environment. Not what it’s supposed to look like. What it actually looks like.

You also want to understand what “normal” is. Which devices are talking the most, when traffic peaks, and whether anything stands out right away. Until you have that, you’re guessing.

Phase 2: Containment + Failure Domains

Once you understand the network, the next step is putting some structure around it. The goal here is simple: if something goes wrong, it shouldn’t affect everything else.

This is where segmentation comes in. Systems are grouped in a way that makes sense for the operation, and boundaries are put in place so issues stay contained where they start.

Phase 3: Remote Access Design

Remote access is usually where things have gotten a little messy over time: different vendors, different tools, different paths in.

This is where that gets cleaned up. Access is brought through defined entry points, and pathways are limited to what’s actually needed.

Phase 4: Redundancy Strategy + Testing

Now you can think about redundancy in a way that actually helps. Instead of just adding backup paths, the goal is to make sure failover behaves the way you expect it to, and that it fits into the structure you’ve already put in place.

In industrial environments, simpler redundancy approaches are often more effective. Designs that are easy to understand and test tend to recover faster and are easier to maintain than more complex architectures that rely on multiple failover conditions.

Phase 5: Ongoing Monitoring + Change Control

Once everything is structured, the focus shifts to keeping it that way.

You want visibility into what’s happening on the network, and a way to manage changes so things don’t slowly drift back into the same problems.

Common Pitfalls (What Not To Do)

You don’t usually spot these right away. Most of them come from decisions that made sense at the time: adding access, solving a local issue, working around a constraint.

Over time, though, they start to show up in how the network behaves. Some of the more common issues we see at INS:

Segmentation that doesn’t actually limit anything: the network is broken into VLANs, but everything can still talk to everything else
Redundancy that makes things harder to understand: extra paths get added without a clear failover plan
Trying to force “clean” IT design into systems that can’t be changed: some equipment just isn’t flexible, and trying to standardize everything can create more risk than working around constraints
Adding tools without clear ownership: visibility or security tools get put in place, but no one is really responsible for using or maintaining them
Treating documentation as something you finish once: it’s accurate at the end of a project, but slowly falls behind as changes are made

These aren’t edge cases. They’re what networks tend to look like when they’ve been evolving for a while without a clear structure guiding the changes.

FAQs

What’s the first best practice if we’re starting from chaos?

Start by figuring out what’s actually there. Get a clear picture of devices, connections, and traffic before trying to fix anything. Without that, every change is a guess.

How do you improve architecture without disrupting production?

You don’t try to fix everything at once. Start by understanding the current state, then make changes in phases: containment first, then access, then redundancy. Each step should reduce risk, not introduce it.

What’s the difference between “redundancy” and “resilience”?

Redundancy is having backup paths or systems. Resilience is how the network behaves when something fails. You can have redundancy without resilience if failover isn’t predictable or tested.

How do you decide where zones should start/end?

Start with how the process actually runs. Group systems that need to communicate, then separate what doesn’t. The goal is to limit how far issues can spread, not just create more segments.

What does INS establish first when designing an OT network?

The focus early on is getting clarity on a few key things:

How the network is actually structured today
What protocols are in use
Where managed vs. unmanaged infrastructure exists
How many devices are really on the network
Which systems are end-of-life or higher risk

That baseline makes it possible to prioritize changes without guessing.

Are You Ready for an OT Network Assessment?

In a lot of cases, the need for an assessment shows up gradually rather than all at once.

You might start noticing things like:

You don’t have accurate diagrams or a clear understanding of how traffic moves through the network
Downtime or performance issues show up without a clear root cause
You’re planning modernization or IIoT initiatives and want to avoid introducing new risk
Vendor access and segmentation exceptions have grown over time and aren’t fully controlled

At that point, the next step isn’t guessing at fixes. It’s getting a clear picture of how the network is actually built and how it behaves.

INS approaches OT network assessments by mapping the current environment, validating traffic behavior, and identifying where structure can be improved without disrupting production. From there, changes can be prioritized and sequenced in a way that strengthens reliability over time.

Ready to get started? Request an OT Network Assessment from INS →