Platform architecture

The Union.ai architecture consists of two virtual private clouds, referred to as planes—the control plane and the compute plane.

Control plane

The control plane:

  • Runs within the Union.ai AWS account.
  • Provides the user interface through which users can access authentication, authorization, observation, and management functions.
  • Is responsible for placing executions onto compute plane clusters and performing other cluster control and management functions.

Compute plane

All your workflow and task executions are performed in the compute plane, which runs within your AWS or GCP account. The compute plane’s clusters are provisioned and managed by the control plane through a resident Union operator with minimal required permissions.

Union.ai operates one control plane for each supported region, which supports all compute planes within that region. You can choose the region in which to locate your compute plane. Currently, Union.ai supports the us-west, us-east, eu-west, and eu-central regions, and more are being added.

Compute plane nodes

Once the compute plane is deployed in your AWS or GCP account, there are different kinds of nodes with different responsibilities running in your cluster. In Union.ai, we distinguish between default nodes and worker nodes.

Default nodes guarantee the basic operation of the compute plane and are always running. Example services that run on these nodes include autoscaling (worker nodes), monitoring services, union operator, and many more.

Worker nodes are responsible for executing your workloads. You have full control over the configuration of your worker nodes.

When worker nodes are not in use, they automatically scale down to the configured minimum. (The default is zero.)

Union.ai operator

The Union.ai hybrid architecture lets you maintain ultimate ownership and control of your data and compute infrastructure while enabling Union.ai to handle the details of managing that infrastructure.

Management of the compute plane is mediated by a dedicated operator (the Union.ai operator) resident on that plane. This operator is designed to perform its functions with only the very minimum set of required permissions. It allows the control plane to spin up and down clusters and provides Union.ai’s support engineers with access to system-level logs and the ability to apply changes as per customer requests. It does not provide direct access to secrets or data.

In addition, communication is always initiated by the Union.ai operator in the compute plane toward the Union.ai control plane, not the other way around. This further enhances the security of your compute plane.

Union.ai is SOC-2 Type 2 certified. A copy of the audit report is available upon request.

Registry data

Registry data is composed of:

  • Names of workflows, tasks, launch plans, and artifacts
  • Input and output types for workflows and tasks
  • Execution status, start time, end time, and duration of workflows and tasks
  • Version information for workflows, tasks, launchplans, and artifacts
  • Artifact definitions

This type of data is stored in the control plane and is used to manage the execution of your workflows. This does not include any workflow or task code, nor any data that is processed by your workflows or tasks.

Execution data

Execution data is composed of::

  • Event data
  • Workflow inputs
  • Workflow outputs
  • Data passed between tasks (task inputs and outputs)

This data is divided into two categories: raw data and literal data.

Raw data

Raw data is composed of:

  • Files and directories
  • Dataframes
  • Models
  • Python-pickled types

These are passed by reference between tasks and are always stored in an object store in your compute plane. This type of data is read by (and may be temporarily cached) by the control plane as needed, but is never stored there.

Literal data

  • Primitive execution inputs (int, string… etc.)
  • JSON-serializable dataclasses

These are passed by value, not by reference, and may be stored in the Union.ai control plane.

Data privacy

If you are concerned with maintaining strict data privacy, be sure not to pass private information in literal form between tasks.