How to Stop Tribal Knowledge in Engineering: Why It Forms, Persists, and What to Do About It
By Norbert Wlodarczyk
Every engineering org has critical knowledge trapped in people’s heads. Understanding why it gets there is the first step to getting it out.
You’ve seen this before
A production incident hits at 2am. The on-call engineer finds the runbook, follows it step by step, and it doesn’t work. Not because the runbook is wrong, exactly. It’s missing a step that everyone on the original team just knew. A config flag that needs to be toggled first. An undocumented dependency between two services.
Three hours later, someone wakes up the engineer who built the system. She fixes it in four minutes.
That four-minute fix represented tribal knowledge: information that exists only in someone’s memory, passed on through conversation, never written down in a way that survives the original context. Every engineering team has it. Most underestimate how much of their operational capability depends on it.
How tribal knowledge forms
Tribal knowledge isn’t a documentation failure. It’s an efficiency optimization that outlives its usefulness.
In small teams, talking is faster than writing. When you sit next to the person who designed the system, a 90-second conversation replaces a 20-minute writeup. The economics are obvious. Nobody writes a doc when the answer is two desks away.
This works until it doesn’t. And the transition isn’t a clean break. It’s a slow erosion. Three things drive it:
Decisions accumulate faster than documentation. A team making five technical decisions a week generates 260 decisions a year. Even if they document half of them, 130 decisions exist only in the memories of whoever was in the room. After two years, the team is sitting on hundreds of undocumented choices, each one a potential trap for anyone who wasn’t there.
Context resists compression. The hardest knowledge to capture isn’t what was decided, but why. The tradeoffs considered. The options rejected. The constraint that made the weird approach the right one. This kind of reasoning is rich, situational, and verbose. It doesn’t compress neatly into a wiki page. So it stays in people’s heads, where it’s easy to recall but impossible for anyone else to access.
Success hides the problem. When tribal knowledge works, it’s invisible. Questions get answered, incidents get resolved, new features get built. The cost only surfaces in edge cases: someone leaves, a team gets reorganized, or a new hire spends their first months doing archaeology instead of shipping. By then, the deficit is already large.
Why it persists
Tribal knowledge has a reinforcing loop that makes it genuinely hard to displace.
Once someone becomes the known answer to a category of questions, routing through them becomes the team’s default. Why search a wiki when you can message the person who built it? This is rational behavior for the person asking. But it means the documentation, even when it exists, gets less traffic. Less traffic means less maintenance. Less maintenance means less trust. Less trust means more people message the expert instead. The cycle tightens.
There’s a subtler dynamic too. Knowledge concentration creates organizational gravity. The person who holds critical context becomes load-bearing. They get pulled into every incident, every architecture review, every onboarding session. The organization treats this as a feature, not a bug. They’re the “go-to person.” They’re valued precisely because the knowledge is concentrated.
But go-to people don’t scale. A 2019 study from the IEEE found that teams with high knowledge concentration, where fewer than 20% of members held more than 80% of domain expertise, experienced significantly longer resolution times for cross-domain issues. The knowledge was there. It just couldn’t flow.
And the tooling most teams rely on actively makes this worse. Wikis and doc platforms treat every document as a flat, independent artifact. They can’t represent that a Slack decision last Tuesday superseded the architecture doc from January. They can’t show that a runbook depends on context from three other documents. Without a way to model how knowledge connects and evolves, even well-maintained documentation becomes a static archive. People learn to distrust it and go back to asking the expert.
How to stop tribal knowledge from compounding
You won’t eliminate tribal knowledge. Some knowledge is genuinely transient and not worth the overhead of formalizing. The goal is reducing concentration risk: making sure critical knowledge isn’t locked in too few heads.
Run a knowledge concentration audit. For each major system or domain, ask: how many people could handle an incident here without calling someone else? If the answer is one, you have a single point of failure. If the answer is zero because that person left last quarter, you have a gap. Map these. The list will be shorter than you expect, and more alarming.
Capture decisions at the point of decision. The most valuable knowledge to externalize is the reasoning behind technical choices. Not after the fact. At the moment the decision is made. A lightweight decision record, two paragraphs covering what was decided and what was considered and rejected, takes five minutes and saves hours of future archaeology. The key is making this a team norm, not a heroic individual effort.
Apply the “second time” rule. If a question gets asked once, answering in Slack is fine. If it gets asked twice, the answer becomes a document. This filters out genuinely one-off questions while catching the patterns. It also distributes the documentation work across the team instead of loading it onto one person.
Model knowledge as a network, not a library. The relationship between pieces of knowledge often matters more than the pieces themselves. A decision about your API versioning connects to the team that made it, the service it affects, the incident that prompted it, and the earlier decision it replaced. Flat document stores can’t represent these connections. When you’re thinking about how to manage knowledge at scale, consider whether your system can model relationships and supersession, not just store and search text. This is the problem we built Nexalink to address.
The compound cost
Tribal knowledge doesn’t announce itself as a crisis. It accumulates like technical debt: invisible in the day-to-day, devastating in aggregate.
The real cost isn’t the individual incident that takes three hours instead of four minutes. It’s the hundreds of small moments where someone makes a decision without context they didn’t know existed, duplicates work because they couldn’t find the prior art, or builds on an assumption that was quietly invalidated months ago.
If you’ve noticed that your team’s ability to move fast has degraded even as you’ve added strong engineers, tribal knowledge is one of the first places to look. Start with the concentration audit. The results tend to clarify what to do next.