Zero-trust is not a network configuration — it's an architectural philosophy

When Google published the BeyondCorp research papers between 2014 and 2018, the security community read them as a network architecture paper: move access controls from the network perimeter to the individual device and user, eliminate the trusted internal network, verify every request explicitly. That reading is accurate but incomplete. The deeper point — the one that shapes how a GCP architecture should actually be built — is that zero-trust is not a configuration you apply to your network. It is a design philosophy that propagates through every layer of your system, from how requests are authenticated at the edge to how your infrastructure code is stored and deployed.

An architecture that deploys BeyondCorp at the presentation layer and then runs agents with broad IAM permissions, stores encryption keys in a shared key ring, and logs to a mutable database has not implemented zero-trust. It has implemented zero-trust's marketing brochure while preserving the implicit trust assumptions that zero-trust was designed to eliminate. The result is a system that claims to be secure from the outside and is vulnerable from the inside — precisely the threat model that gave us the large-scale breaches of the past decade.

This article traces the zero-trust philosophy through the four layers of a GCP-native AI architecture: how it is expressed at each layer, what specific GCP primitive implements each expression, and — critically — why the primitives must work together as a system rather than as independent features bolted onto a conventionally trusted architecture.

The central principle: trust is earned at every boundary, not assumed inside any perimeter

The traditional enterprise security model operated on a perimeter assumption: traffic inside the corporate network was trusted; traffic from outside was not. This assumption was already fragile before cloud computing became prevalent. It became untenable when workloads moved to managed services that have no meaningful "inside" — when the boundary between trusted and untrusted became a legal and organisational concept rather than a network topology concept.

Zero-trust replaces the perimeter assumption with three operating principles. First: never trust, always verify — every request must be authenticated and authorised regardless of its origin, including requests from inside the system itself. Second: least privilege — every principal (user, service account, agent) is granted exactly the permissions it needs for its defined function and no more. Third: assume breach — design every component as if the components adjacent to it may be compromised, because at production scale, some of them will be.

These three principles, applied consistently across all four layers of a GCP architecture, produce a system where the security properties are structurally enforced rather than procedurally maintained. The distinction matters for the same reason it matters in compliance: procedures can be bypassed under deadline pressure or configuration error. Structural enforcement cannot be bypassed without a code change, which requires a review, which creates a record.

The question is not whether your architecture has security features. It is whether those features make your compliance claims trustworthy in the absence of human vigilance. A system that is secure only when nobody makes a mistake is not a secure system. It is a system waiting for the right mistake.

Layer by layer: how zero-trust is expressed in GCP

The diagram below shows the four-layer GCP architecture from the previous articles in this series, annotated with the specific zero-trust enforcement mechanism at each layer boundary. The key observation is directionality: trust is re-established at every boundary crossing, and the enforcement happens at the boundary, not inside the component. No component is implicitly trusted by the components it communicates with.

Figure 1 — Zero-trust enforcement map: mechanism at every layer boundary

Zero-trust enforcement is not concentrated at a single layer. Each layer boundary has a dedicated enforcement mechanism that operates independently of the application logic above it. A compromised agent in Layer 2 cannot exfiltrate data from Layer 3 — VPC Service Controls block the API call regardless of the agent's permissions. A compromised service cannot rotate its own encryption keys — those live in Cloud KMS with Workload Identity-bound access.

BeyondCorp: why network location is not identity

The starting point for understanding BeyondCorp is recognising what it replaces. In a traditional enterprise network, being on the corporate VPN or the corporate LAN was treated as a meaningful security signal — it meant you had passed some authentication boundary to get there, so subsequent requests to internal services were trusted without further verification. This assumption has three failure modes that became increasingly consequential as enterprise infrastructure moved to the cloud.

The first is lateral movement: once an attacker is inside the network boundary — through a phishing credential, a compromised endpoint, or a misconfigured firewall rule — they can move between internal services without re-authenticating. The internal network is effectively flat. The second is the insider threat: a legitimate user on the internal network can access services they have no business reason to access, because the access control is the network boundary rather than the service itself. The third is the impossible perimeter problem: in a hybrid or cloud environment, there is no coherent "inside." Data lives in GCS, compute runs on Cloud Run, APIs are served through Apigee. The network perimeter that traditional security models relied on does not have a natural location in this topology.

BeyondCorp's response is to move access controls from the network to the application layer, evaluated on every request, based on the identity of the user and the posture of the device rather than the network segment the request originates from. In GCP, this is implemented through Identity-Aware Proxy (IAP): every request to an internal service passes through IAP, which validates the user's Google identity and can additionally evaluate device certificate, OS version, corporate-managed status, and geographic context before issuing an access token. The access token is short-lived. It is not cached by the internal network. It is re-evaluated on each request.

The architectural consequence is that being inside the VPC confers nothing. A service running in the same GKE cluster as another service cannot call that service's API without a valid IAP-validated token. This is the "never trust, always verify" principle applied at the application layer — not as a configuration option, but as the only way to make requests to protected resources.

Workload Identity: eliminating the secret that can be stolen

The weakest link in most cloud security models is not the identity system — it is the credential. Service accounts need to authenticate to GCP APIs. The traditional way to do this is a service account key: a JSON file containing a private key, downloaded from the console, stored somewhere, rotated manually. In practice, these keys end up in git repositories, CI/CD environment variables, container images, and Kubernetes secrets, from any of which they can be extracted by an attacker with filesystem or API access. The key can be used from anywhere — including from outside the cloud environment. The compromise leaves no trace in the GCP audit log because the key itself is the authentication credential.

Workload Identity eliminates the key entirely. Instead of a long-lived credential that can be exfiltrated, each GKE workload is bound to a Kubernetes service account, which is in turn federated to a GCP service account via an IAM policy binding. The GKE node pool's metadata server issues short-lived OAuth tokens to pods running on the node, scoped to the bound GCP service account. The tokens expire in minutes. They cannot be used outside the GKE environment. They are issued by GCP's own infrastructure, not stored anywhere in the application.

The security property this creates is more important than the operational convenience: a compromised application container cannot be used to steal credentials, because there are no credentials to steal. The token it can obtain is already scoped to the minimum IAM permissions for that service account, expires shortly, and is logged in Cloud Audit Logs on every use. An attacker who compromises a container has access to exactly what that container is legitimately authorised to do, for the duration of the current token, and the access is recorded.

Figure 2 — Workload Identity versus service account key: what an attacker can steal

Service account keys are credentials that can be exfiltrated and used from anywhere, indefinitely. Workload Identity issues short-lived, GKE-bound tokens that cannot be stolen — there is no persistent credential to extract. An attacker who compromises a container has access only to what that container is legitimately authorised to do, for the lifetime of the current token.

VPC Service Controls: the perimeter that doesn't rely on network topology

The third component of the zero-trust stack addresses the "assume breach" principle at the data layer. Even if an attacker compromises a service account, escalates privileges through a misconfigured IAM binding, or exploits a vulnerability in an application container, VPC Service Controls can prevent them from exfiltrating data from GCP-managed services like BigQuery, Cloud Storage, or AlloyDB.

VPC-SC works by defining a service perimeter — a boundary around a set of GCP projects and services — and enforcing that data cannot flow across that boundary without satisfying explicit access level conditions. A BigQuery dataset inside the perimeter cannot be queried from outside the perimeter, even by a principal with the correct IAM roles, unless that principal satisfies the configured access conditions: correct identity, correct network origin, correct device posture. An API call that would normally succeed fails with a VPC-SC policy violation, which is logged to Cloud Audit Logs and can trigger a Cloud Monitoring alert.

The security property this creates is one that IAM alone cannot provide. IAM controls who can access a resource. VPC-SC controls from where that access can happen. These are orthogonal controls, and both are necessary. An attacker who has obtained a valid service account token through a compromised container can make IAM-authorised API calls — VPC-SC denies those calls if they originate from outside the defined perimeter. The data exfiltration path is closed regardless of the application-layer security posture.

In a multi-tenant deployment — one GCP project per enterprise client, as in both the deal desk and knowledge graph architectures — VPC-SC provides an additional guarantee: tenant data cannot be accessed from another tenant's project, regardless of IAM configuration. The perimeter boundary is at the project level. Cross-tenant data access is structurally impossible, not just policy-prohibited.

CMEK: the encryption key that belongs to the customer, not the platform

Customer-Managed Encryption Keys (CMEK) address a trust question that is easy to overlook: do you trust the cloud platform to manage the encryption of your data? For most use cases, Google-managed encryption keys are sufficient — data is encrypted at rest, and Google controls the key rotation. For enterprise deployments with regulatory requirements, data residency obligations, or contractual data sovereignty commitments, this is often not enough.

CMEK allows you to bring your own encryption key, stored in Cloud KMS, for every GCP service that holds your data: BigQuery datasets, Cloud Storage buckets, AlloyDB instances, Pub/Sub topics, and more. The key material never leaves Cloud KMS. It is used to envelope-encrypt the data encryption keys that GCP uses internally. If you revoke the key, the data becomes inaccessible — to you, to GCP, and to anyone else. The access to the key is governed by IAM policies that you control, and every key use is logged in Cloud KMS audit logs.

The architectural property this creates is: Google cannot access your data unless you allow them to. In practice, for most enterprise deployments, this is a compliance posture statement rather than a practical security measure — Google's platform security is extremely strong. But for financial institutions with regulatory requirements, healthcare organisations with HIPAA obligations, or any enterprise with a contractual data sovereignty commitment, "technically Google cannot access my data" is a statement that the customer-managed key makes possible to make, and that Google-managed keys make impossible.

In the architecture described throughout this series, each enterprise tenant has a dedicated CMEK key ring in Cloud KMS. The key ring is in the tenant's project. The keys rotate annually by default. Every service that holds tenant data — AlloyDB, BigQuery audit tables, Cloud Storage corpus — is configured to use the tenant-specific key. Cross-tenant key access is structurally impossible because the key rings are in separate projects with separate IAM policies.

Chronicle and the append-only audit ledger

The final component of the zero-trust stack is the audit trail, and the specific property it must have: immutability. An audit trail that can be modified by the systems it records is not an audit trail. It is a record that an audit trail was intended but that its contents cannot be relied upon.

Chronicle is Google's cloud-native SIEM, built on a petabyte-scale analytics infrastructure with a specific security property: writes are append-only. A log entry, once written to Chronicle, cannot be modified or deleted by any application-layer process. The only principals who can interact with Chronicle audit data are the Chronicle service itself and the Chronicle API, which enforces the append-only semantics. Even an attacker who has compromised the application that emits audit events cannot modify the audit records those events produced.

This matters because the audit trail serves a different function from the application logs. Application logs record what the system did. The Chronicle audit trail records what the system was permitted to do, what it actually did, and who authorised it — in a format that can be retrieved years later and that cannot have been tampered with in the interim. YARA-L detection rules run continuously against the audit stream, alerting on anomalous patterns in real time. The audit trail is both the compliance evidence and the threat detection surface.

In the architecture described in this series, the Audit Agent is the component responsible for emitting to Chronicle. Its failure mode is defined explicitly in the FMEA: if Chronicle is unavailable, the agent holds the response until emission succeeds or a retry deadline is exceeded. No response is delivered to the user without a corresponding audit record. This is not a preference. It is an architectural constraint enforced by the execution model: the audit emission is a blocking step in the agent's workflow, not a fire-and-forget side effect.

Terraform IaC: the infrastructure that cannot drift from its specification

The fifth and most often overlooked component of a zero-trust GCP architecture is Infrastructure as Code — specifically, the enforcement that every infrastructure resource is defined in Terraform and that no manual console changes are permitted. This is not a DevOps best practice. It is a zero-trust requirement.

The problem with manually created infrastructure is that it cannot be audited. If a GCP resource was created by a human clicking through the console, there is no guarantee that its current configuration matches any specification that has been reviewed and approved. VPC firewall rules can be added. IAM bindings can be expanded. Service accounts can be created. Each of these changes may be logged in Cloud Audit Logs, but the logs record what happened — not whether what happened was intended, reviewed, or consistent with the architecture's security posture.

Terraform, enforced through a GitOps pipeline (Cloud Build triggering on main branch commits), closes this gap. Every infrastructure change must be expressed as a Terraform plan, reviewed through a pull request process, and applied through a CI/CD pipeline that has the necessary permissions. The Terraform state is the authoritative record of what infrastructure exists. Cloud Security Command Centre can detect configuration drift — resources that exist but are not in Terraform state — and alert on them. The security principle is: if it's not in the code, it shouldn't exist, and if it does exist without being in the code, something has gone wrong.

Infrastructure that cannot be reproduced from code is infrastructure that cannot be audited, cannot be disaster-recovered, and cannot be verified against a security baseline. Terraform is not automation. It is the definition of what the infrastructure is allowed to be.

The five components as a system

Individually, each of these five components — BeyondCorp IAP, Workload Identity, VPC Service Controls, CMEK, and Chronicle with Terraform IaC — provides a meaningful security improvement over a conventional architecture. Together, they implement zero-trust as a philosophy rather than a feature: the principle that no layer of the system is trusted by virtue of its position, every access is verified, every action is recorded, and the infrastructure that enforces these properties is itself defined in code that cannot be changed without a reviewed and recorded process.

Figure 3 — Zero-trust component map: mechanism, GCP primitive, and what it makes structurally impossible

Component	Zero-trust principle	GCP primitive	What it makes impossible	Category
BeyondCorp IAP	Never trust, always verify. Network location confers no access.	Identity-Aware Proxy, context-aware access policies, device certificate validation	Accessing a protected service from inside the VPC without a verified identity and device posture. Lateral movement after network compromise.	Enforce
Workload Identity	Least privilege. No persistent credentials to steal.	GKE Workload Identity, Kubernetes service account federation, node metadata server	Exfiltrating a service account credential from a compromised container. Using a stolen credential from outside GKE. Credentials persisting past token expiry.	Prevent
VPC Service Controls	Assume breach. Data cannot leave the perimeter even if application is compromised.	VPC Service Controls perimeters, access levels, ingress and egress policies	Exfiltrating data from BigQuery, Cloud Storage, or AlloyDB from outside the defined perimeter, regardless of IAM permissions. Cross-tenant data access in single-tenant deployments.	Prevent
CMEK (Cloud KMS)	Data sovereignty. Platform cannot access data without customer authorisation.	Customer-managed encryption keys, Cloud KMS key rings per tenant, envelope encryption for all GCP storage services	Platform-level access to encrypted data without customer key authorisation. Cross-tenant key access in single-tenant deployments (key rings are project-scoped).	Enforce
Chronicle SIEM	Assume breach. Every action is recorded and that record cannot be altered.	Chronicle append-only write semantics, YARA-L detection rules, 24-month hot retention	Modifying an audit record after the fact. Silently bypassing a governed action without a trace. Compromising the security of the audit trail from the application layer.	Audit
Terraform IaC	Never trust implicit configuration. Infrastructure must be verified against code.	Terraform 1.7+, Cloud Build GitOps pipeline, Security Command Centre drift detection	Infrastructure resources existing outside the defined architecture. Unreviewed configuration changes taking effect. Inability to reproduce infrastructure from code after an incident.	Detect

The sequence in which these components are implemented matters as much as implementing all of them. Terraform should be the first to be established — everything else is defined in code from day one. Workload Identity eliminates the credential problem before any application code is written. VPC-SC defines the data perimeter before any data is loaded. CMEK is configured at service creation — retrofitting it to an existing dataset requires migration. Chronicle logging should be active before any agent is deployed. BeyondCorp IAP is the last piece to lock down, typically, because it requires understanding what access patterns are legitimate before you can define context-aware access policies without locking out legitimate users.

An architecture that implements all five in this order, from infrastructure code, produces a system where the security properties are not dependent on anyone remembering to follow the security policy. They are properties of the system. The human vigilance they require is the vigilance to write good Terraform, review pull requests, and respond to Chronicle alerts — not the vigilance to correctly enforce a hundred separate security procedures under operational pressure. That difference, at production scale, over a period of years, is the difference between a security posture that holds and one that erodes.

References & Further Reading

Ward, R. et al. (2014–2018). "BeyondCorp" paper series. Google Technical Infrastructure. — The foundational papers defining the BeyondCorp model: "A New Approach to Enterprise Security" (2014), "Design to Deployment at Google" (2016), "The Access Proxy" (2017), "Migrating to BeyondCorp" (2017), "The Human Element" (2018). The architectural philosophy described in this article originates in this research.

Google Cloud, VPC Service Controls documentation — The technical specification for service perimeter configuration, access levels, ingress and egress policies, and the API-level enforcement model described in this article.

Google Cloud, Workload Identity documentation — The federation model, metadata server mechanics, and IAM binding structure for GKE Workload Identity. The specific security properties (no persistent credential, GKE-bound tokens, Cloud Audit Log coverage) are documented here.

Google Cloud, Chronicle SIEM documentation — The append-only write semantics, YARA-L detection rule engine, and 24-month hot retention model. The compliance evidence use case for Chronicle is described in the "Security operations" section.

NIST SP 800-207, "Zero Trust Architecture" (2020) — The NIST formal specification of zero-trust architecture, which provides the vendor-neutral framing of the three principles (never trust always verify, least privilege, assume breach) described in this article.

HashiCorp, Terraform documentation and Google Cloud provider — The IaC implementation model described in the article. The Security Command Centre drift detection integration for Terraform state is documented in the Google Cloud Security Command Centre API reference.

These notes are published when there is something worth saying. To receive new Field Notes directly, write to hello@datadomine.com with the subject line: Field Notes.

All Field Notes Programmes Get in touch