Don’t let your system of record become a system of pain

A System of Record (SoR), or the authoritative source for a domain’s data, is the beating heart of your organisation. Think civil registries for citizen data, national platforms to prove entitlement, or the contents of an ERP system.

It’s big, it’s mission critical, and without it operations and service delivery suffer. So building a reliable SoR is challenging. If done badly, it can quickly become a maligned part of the estate which users will do anything to dodge. It will be rigid, hard to change, prone to data issues, and a nightmare when it comes to future integrations and innovations.

With this in mind, one of your biggest goals is to make sure that the system of record doesn’t become a system of pain.

‍

Understanding what makes a great system of record

A north star is fundamental for understanding what good looks like. Before diving into delivery, you should start your journey by listing the non-negotiable parts of your SoR that must be top quality.

What this list looks like depends on your individual needs and environment, but these are the most common principles we encounter when supporting our clients in these types of projects:

Authoritativeness. Because the SoR is the single source of truth, the ownership of multiple data sources must be established. One, and only one, system should be the “master” for each piece of information that the SoR ingests. For example, in a person registry, an individual’s name, birth date and address might be mastered in the civil register, while their current address could exist in a separate address service elsewhere. Part of SoR design is delineating these boundaries to avoid duplication or confusion.
‍
Immutability and traceability. A robust SoR doesn’t allow silent overwrites of critical data. It keeps history or uses append-only approaches for auditing. For instance, a digitally signed append-only log can ensure no record is tampered with and provide proofs of authenticity. With this in mind, GDS has previously experimented with Merkle-tree based logs for registers to guarantee that an entry was indeed written by the authority and that history hadn’t been altered. Incorporating such patterns means the system can be trusted for the long term.
‍
Schema flexibility. No matter how well you model today’s requirements, data needs will evolve. A SoR must support schema evolution without major upheaval. This could mean designing with an extension mechanism for new fields, or using a schema registry and versioning for events. A rigid schema that cannot change without a full migration quickly becomes a ticking time bomb.
‍
Performance and scaling. As the central source, a SoR often handles high read volumes (and moderate write volumes). Where possible, it should scale horizontally or even use replication to offload queries. However, a pattern emerging is to separate the SoR (authoritative writes) from read-optimised replicas or caches so that the core is protected from read overload. You should define your availability, latency and recovery targets upfront. Uptime expectations, RTO and RPO decisions should shape the architecture, not follow incidents.
‍
Security and access control. As we explained earlier, SoRs are mission-critical organisational-wide resources. With this in mind they will contain highly sensitive data including citizen information, financial records and even organisational intelligence. It means that fine-grained access control, encryption, and monitoring of access are non-negotiable. Every query to the SoR should be auditable and traceable so that finding out who accessed what is a simple task that can be completed in seconds.
‍
Ownership matters. Assign a named product owner for the data model and a clear approval path for schema changes. Without governance discipline, drift and fragmentation are inevitable.

With this understanding of what makes a great SoR, it’s equally important to define what the SoR is not. It should hold authority, not embed workflows or business logic. Clear domain boundaries will prevent it from becoming a bloated “all for one system” over time.

‍

Versioning contracts for hassle-free schema evolution

One common source of “system pain” is handling changes in data structure. If the SoRs schema or API contract changes, then all the consumers can break. To avoid a painful ripple effect, adopt an expand-contract pattern for changes. This approach, which is often popularised in database and API design, means that ‘breaking changes’ are never made abruptly.

Expand (additive changes). When new data needs to be stored or new API fields are required, add them in a backwards-compatible way. For example, instead of renaming a field, add a new field and keep the old one for a deprecation period. Clients that don’t know about the new field continue working with the old data. As a concrete example, if you have "address" as a single string and realise you need structured components (street, city, etc.), you might add a new structured object "address_components" while still populating the legacy "address" field. Document that the new structure is the future, but don’t immediately remove the old.
‍
Contract versioning. Provide a means for consumers to request a specific version of the API or data contract. The UK government APIs often version via the URL or accept header, e.g. GET /api/v2/resource for the new version. They urge that any change removing or altering fields “must be accompanied by a change in the API version” to avoid breaking client code. Designing payloads with foresight (like using arrays for fields that might one day have multiple values, or objects for future expansion) helps minimise how often you need a major version bump.
‍
Deprecation policy. When you do create a new version, you should maintain the old version for a reasonable time. Announce deprecations well in advance and provide testing sandboxes for consumers to validate against the new version. A System of Record in a government context might need to support old interfaces for years, not months, due to the slower upgrade cycles of downstream agencies.
‍
Schema registry & validation. If using an event-driven approach (e.g., the SoR emits events for changes), you should employ a schema registry (like Confluent Schema Registry for Kafka or similar for EventBridge, etc.). This ensures producers and consumers agree on data format. Enforce compatibility rules like only adding optional fields or new event types in a way that doesn’t break old consumers. Tools and patterns exist for evolving schemas intentionally.

In summary, treat the SoR’s data model as a public interface and design it for extension. The initial design should consider future needs, for example using ISO standards for fields like date/time to accommodate global use, or planning for multiple name components for cultures that have different naming conventions.

And remember the very sensible GOV.UK API guidance that “if you don’t change the version, you run the risk of breaking someone’s application.”

‍

Automating to prevent garbage

“Garbage in, garbage out,” goes the old adage. A great SoR can only be as great as the quality of data it holds.

One huge source of pain is bad data entering the system and then propagating. To avoid this, implement data integrity gates (automated checks at every entry point of the SoR) to enforce business rules and consistency.

Modern data engineering practice encourages embedding tests and validation into pipelines. For example, analytics engineers using tools like dbt have tests for not-null, uniqueness, referential integrity that run on every change. We can bring similar rigor to operational systems:

Validate on write. Enforce constraints whenever a record is created or updated in the SoR. Use database constraints (unique, foreign keys) where possible for technical enforcement. Additionally, have application-level validations like not allowing a person’s death date that is in the future. These rules seem obvious, but coding them as gatekeepers prevents anomalies. Many robust systems funnel all writes through a common API or service so that these checks are centralised.
‍
Automated data quality checks. Implement a process (maybe nightly or real-time triggers) that scans for anomalies like duplicates, missing mandatory fields, etc. In distributed systems, it’s possible an eventual consistency model leads to temporary breaches of integrity such as two systems concurrently registering the same entity. Your SoR should detect and reconcile these cases. For instance, set up a rule that no national insurance number should appear in two person records; and if it does, flag it for manual review.
‍
Testing in CI/CD. As part of deployment pipelines, include tests that ensure new code doesn’t violate data integrity expectations. If your SoR uses a database, you can have a suite of queries that run in a staging environment to verify constraints hold after migrations or code changes.

Remember, when the SoR is integrated with other systems (and it always is), you should design it for idempotent operations. Idempotency means if the same action is performed twice, it has no additional effect beyond the first time. This is crucial in distributed environments where duplicate events or retries occur. AWS’s own guidelines for Step Functions and Lambdas state: “Your business logic... must be idempotent to handle potential retries. Use idempotency keys to ensure operations like payments or database writes aren’t performed twice.”

For your SoR, this might mean if an “CreatePerson” event is processed, the system can recognise a duplicate and not create the person again (perhaps by unique natural IDs or an idempotency token from the caller). Without this, you risk data divergence and multiple conflicting records.

‍

Consumer migration

When evolving a SoR, one of the biggest challenges is migrating all the consumers or the systems and services that rely on the SoRs data. A system of record often feeds numerous downstream apps (for example, a citizen data SoR might feed benefits systems, tax systems, etc.)

Migration should be phased, not event-based. Techniques such as dual writes, shadow reads and reconciliation checks reduce cutover risk and will prevent temporary adapters from becoming permanent fixtures.

To avoid turning the SoR into a source of pain for others, consider:

API layer and backward compatibility. As mentioned, maintain old and new APIs during transitions. It’s helpful to have an API gateway that can route to transformation logic if needed. For example, if the new SoR stores something differently, the gateway could translate old API requests to the new structure under the hood, allowing clients to transition at their own pace.
‍
Plan for coexistence. Prepare to run both versions in parallel for a defined period. This isn't just about having both available; it's about giving consumers time to migrate at their own pace while you validate that v2 works correctly. Set a clear sunset date for v1 so people know this isn't indefinite.
‍
Event versioning and parallel feeds. If consumers get events from the SoR, you may need to publish events in multiple formats during a migration period. A common approach would be to publish both EventTypeV1 and EventTypeV2 for a while, so that consumers can switch consumption when ready. This is similar to API versioning but in an asynchronous context.
‍
Communication of changes. Establish a clear communication channel with all known consumers of the SoR. For large programmes, a fortnightly interface review or “demand review” meeting can help. This prevents surprises and keeps everyone aligned. In the context of Scrumconnect’s programmes, for example, we routinely schedule a regular cadence to review any architectural changes or new demands to help avoid last-minute shocks and ensure continuous alignment with product roadmaps.
‍
Consumer-driven contracts testing. This is a practice where consumers specify their expectations in tests, and the provider (SoR) runs those to ensure compatibility. It’s advanced, but for critical SoRs with many clients, it can catch breaking changes early. If, say, a downstream system expects that “if a person has multiple addresses, the SoR will send at most three addresses,” then you could formalise that as a test in the build pipeline of the SoR.

Lastly, when it comes to replacing an old SoR with a new one, you should consider a bridging layer. For example, when the UK government replaces a legacy service, they sometimes build an adapter to make the new system look like the old to consumers (at least temporarily). This can be done via techniques like having the new system populate the old system’s database or mimic its API, until consumers gradually repoint to the new API. This “LIPS style bridging” (to borrow the prompt’s term) could be something like a live integration proxy: all calls to old SoR are secretly handled by new SoR (or vice versa) during transition. The goal is no consumer becomes broken at the moment of cutover.

‍

Keeping the SoR healthy in production through guardrails

Once your System of Record is live and serving as the authority, instituting operational guardrails will prevent it from picking up poor hygiene habits over time. Some common guardrails you should consider implementing are:

Monitoring and tracing. Enable distributed tracing on the SoR’s services like AWS X-Ray. A SoR often has many integration points, so tracing helps find where latency or errors occur. AWS’s X-Ray’s service map can show how requests flow through the SoR’s components and downstream, highlighting bottlenecks. It can even surface issues like cold starts in serverless components or slow queries. Other cloud vendors have similar tools. With a complex SoR, having this observability is key to quickly resolving incidents.
‍
Limit rate throttling. Protect the core database or service from overload by applying rate limits on clients. If one rogue consumer starts hammering the SoR with requests, the guardrail should throttle it (return ‘429: Too Many Requests’ or similar). This ensures that one misbehaving integration doesn’t take down the entire source of truth. Using an API gateway or service mesh can help implement this consistently.
‍
Idempotency keys and duplicate checking. As mentioned, ensure that if a client or integration repeats a request (intentionally or unintentionally), the SoR can recognise it and avoid duplicated side effects.
‍
Fail-safes for risky operations. Add extra precaution for administrative or destructive actions. If the SoR has a function to, say, bulk-delete records or modify many entries, consider requiring multi-factor confirmations or time-window execution. Some organisations even build what’s akin to a “circuit breaker” for database changes so that if a script is about to modify an unusually large number of records outside of a maintenance window, it aborts or requires special approval. The hoop.dev platform, for example, provides guardrails that can stop dangerous operations like dropping a production table before they happen.
‍
Backups and immutable logs. Ensure regular backups of the SoR data (with secure storage). Consider using an immutable log (like a blockchain or simply an append-only audit log) for critical transactions so you always have a ledger of what happened. Some modern SoRs use event sourcing where every change is an event stored immutably. This inherently provides an audit trail and the ability to rebuild state if needed. But remember, authoritative does not mean permanent. You should define your retention, archiving and erasure policies early because storage and compliance risks compound quietly.
‍
Continuous integrity monitoring. Even after establishing integrity gates, keep monitoring data quality metrics and thresholds like number of duplicate identities, percentage of records with all mandatory fields, etc. If something drifts (say a bug causes some records to save without a required field) then early alarms will trigger.

Even strong programmes fall into predictable traps. They let workflow logic creep into the SoR, allow direct database access, introduce breaking changes without versioning, or treat migration as a single cutover event. These patterns create more long-term pain than technical complexity ever does.

‍

‍So long, you system of pain!

Building a System of Record is a long-term investment. If done with adaptability and governance in mind, it can serve for decades without becoming a bottleneck. The main takeaways to avoid a “system of pain” are:

Design for change. Expect new fields, new uses, and growth. Use proper versioning and expand-contract approaches so you can evolve without breaking everything.
‍
Jealously guard its quality. Put front-line defences like validation rules or integrity tests to prevent corrupted or incomplete data at all service boundaries. Don’t rely on downstream cleanup by keeping bad data out from the start.
‍
Automate integrity and testing. Make data integrity a CI/CD concern (like tests for unique and not-null in your pipeline). This early catch of issues prevents future pain in production.
‍
Empower consumers safely. Provide stable interfaces, clear deprecation paths, and support during migrations. A SoR should be a reliable platform for others. Strive for backwards compatibility and thorough communication so that consuming systems trust your changes.
‍
Strive for operational excellence. Implement observability and guardrails, everything from rate limits to admin safeguards. It’s easier to build these in at start than to retrofit after an incident. Visualise components and identify bottlenecks with tracing because you can’t fix what you can’t see.
‍
Keep your eye on the horizon. The SoR often underpins strategic outcomes (citizen trust, financial accuracy, etc.). Try to measure indicators of trust: data freshness against SLA, duplicate record rate, schema change frequency, consumer breakage incidents, and anomaly resolution time. These will show drift before it becomes systemic pain.

By engineering a well-built System of Record, you create an organisational asset, not a headache. It provides one source of truth that everyone trusts, it adapts to new needs with minimal fuss, and it safeguards the precious data it holds. By applying the principles and practices we’ve discussed, everything from schema versioning rules to data integrity gates, you can ensure your SoR stands the test of time and serves real need without inflicting pain on developers or users.

View All