Implementation Guide
Airbyte Enterprise is in an early access stage for select priority users. Once you are qualified for an Airbyte Enterprise license key, you can deploy Airbyte with the following instructions.
Airbyte Enterprise must be deployed using Kubernetes. This is to enable Airbyte's best performance and scale. The core components (api server, scheduler, etc) run as deployments while the scheduler launches connector-related pods on different nodes.
Prerequisites
There are three prerequisites to deploying Enterprise: installing helm, a Kubernetes cluster, and having configured kubectl
to connect to the cluster.
For production, we recommend deploying to EKS, GKE or AKS. If you are doing some local testing, follow the cluster setup instructions outlined here.
To install kubectl
, please follow these instructions. To configure kubectl
to connect to your cluster by using kubectl use-context my-cluster-name
, see the following:
Configure kubectl to connect to your cluster
- GKE
- EKS
- Configure
gcloud
withgcloud auth login
. - On the Google Cloud Console, the cluster page will have a "Connect" button, with a command to run locally:
gcloud container clusters get-credentials $CLUSTER_NAME --zone $ZONE_NAME --project $PROJECT_NAME
- Use
kubectl config get-contexts
to show the contexts available. - Run
kubectl config use-context $GKE_CONTEXT
to access the cluster from kubectl.
- Configure your AWS CLI to connect to your project.
- Install eksctl.
- Run
eksctl utils write-kubeconfig --cluster=$CLUSTER_NAME
to make the context available to kubectl. - Use
kubectl config get-contexts
to show the contexts available. - Run
kubectl config use-context $EKS_CONTEXT
to access the cluster with kubectl.
Deploy Airbyte Enterprise
Add Airbyte Helm Repository
Follow these instructions to add the Airbyte helm repository:
- Run
helm repo add airbyte https://airbytehq.github.io/helm-charts
, whereairbyte
is the name of the repository that will be indexed locally. - Perform the repo indexing process, and ensure your helm repository is up-to-date by running
helm repo update
. - You can then browse all charts uploaded to your repository by running
helm search repo airbyte
.
Clone & Configure Airbyte
git clone
the latest revision of the airbyte-platform repositoryCreate a new
airbyte.yml
file in theconfigs
directory of theairbyte-platform
folder. You may also copyairbyte.sample.yml
to use as a template:
cp configs/airbyte.sample.yml configs/airbyte.yml
Add your Airbyte Enterprise license key to your
airbyte.yml
.Add your auth details to your
airbyte.yml
. Auth configurations aren't easy to modify after Airbyte is installed, so please double check them to make sure they're accurate before proceeding.
Configuring auth in your airbyte.yml file
To configure SSO with Okta, add the following at the end of your airbyte.yml
file:
auth:
identity-providers:
- type: okta
domain: $OKTA_DOMAIN
app-name: $OKTA_APP_INTEGRATION_NAME
client-id: $OKTA_CLIENT_ID
client-secret: $OKTA_CLIENT_SECRET
To configure basic auth (deploy without SSO), remove the entire auth:
section from your airbyte.yml config file. You will authenticate with the instance admin user and password included in the your airbyte.yml
.
Configuring the Airbyte Database
For Self-Managed Enterprise deployments, we advise against using the default Postgres database (airbyte/db
) that Airbyte spins up within the Kubernetes cluster. For production, you should instead use a dedicated instance for better reliability, and backups (such as AWS RDS or GCP Cloud SQL).
We assume in the following that you've already configured a Postgres instance:
- In the
charts/airbyte/values.yaml
file, disable the default Postgres database (airbyte/db
):
postgresql:
enabled: false
- In the
charts/airbyte/values.yaml
file, enable and configure the external Postgres database:
externalDatabase:
host: ## Database host
user: ## Non-root username for the Airbyte database
database: db-airbyte ## Database name
port: 5432 ## Database port number
For the non-root user's password which has database access, you may use password
, existingSecret
or jdbcUrl
. We recommend using existingSecret
, or injecting sensitive fields from your own external secret store. Each of these parameters is mutually exclusive:
externalDatabase:
...
password: ## Password for non-root database user
existingSecret: ## The name of an existing Kubernetes secret containing the password.
existingSecretPasswordKey: ## The Kubernetes secret key containing the password.
jdbcUrl: "jdbc:postgresql://<user>:<password>@localhost:5432/db-airbyte" ## Full database JDBC URL. You can also add additional arguments.
The optional jdbcUrl
field should be entered in the following format: jdbc:postgresql://localhost:5432/db-airbyte
. We recommend against using this unless you need to add additional extra arguments can be passed to the JDBC driver at this time (e.g. to handle SSL).
- Finally, add this configuration into the
global
section of thecharts/airbyte/values.yaml
as well:
global:
...
database:
secretName: "airbyte-enterprise-airbyte-secrets" ## The name of the existing Kubernetes secret entered above.
secretValue: "DATABASE_PASSWORD" ## The Kubernetes secret key entered above.
host: "localhost" ## Database host entered above.
port: "5432" ## Database port entered above.
If you've used password
or jdbcUrl
above instead using a pre-existing Kubernetes secret, then your secretName
will be {release-name}-airbyte-secrets
, and secretValue
will be DATABASE_PASSWORD
.
Configuring External Logging
For Self-Managed Enterprise deployments, we advise against using the default Minio storage (airbyte/minio
) that Airbyte spins up within the Kubernetes cluster. For production, we recommend spinning up standalone log storage for additional reliability using tools such as S3 and GCS. It's then a common practice to configure additional log forwarding from external log storage into your observability tool.
- In the
charts/airbyte/values.yaml
file, disable the default Minio instance (airbyte/minio
):
minio:
enabled: false
How critical is the reliability of the state storage? Is it critical, does it need to be durable? Or is it just a fallback?
- In the
charts/airbyte/values.yaml
file, enable and configure external log storage:
- S3
- GKE
global:
...
logs:
storage:
type: "S3"
minio:
enabled: false
s3:
enabled: true
bucket: "" ## S3 bucket name that you've created.
bucketRegion: "" ## e.g. us-east-1
accessKey: ## AWS Access Key.
password: ""
existingSecret: "" ## The name of an existing Kubernetes secret containing the AWS Access Key.
existingSecretKey: "" ## The Kubernetes secret key containing the AWS Access Key.
secretKey: ## AWS Secret Access Key
password:
existingSecret: "" ## The name of an existing Kubernetes secret containing the AWS Secret Access Key.
existingSecretKey: "" ## The name of an existing Kubernetes secret containing the AWS Secret Access Key.
For each of accessKey
and secretKey
, the password
and existingSecret
fields are mutually exclusive.
global:
...
logs:
storage:
type: "GCS"
minio:
enabled: false
gcs:
bucket: airbyte-dev-logs # GCS bucket name that you've created.
credentials: "" ## ???
credentialsJson: "" ## Base64 encoded json GCP credentials file contents
Note that the credentials
and credentialsJson
fields are mutually exclusive.
Configuring Ingress
To access the Airbyte UI, you will need to manually attach an ingress configuration to your deployment. The following is a skimmed down definition of an ingress resource you could use for Self-Managed Enterprise:
apiVersion: networking.k8s.io/v1
kind: Ingress
spec:
rules:
- host: enterprise-demo.airbyte.com
http:
paths:
- backend:
service:
name: airbyte-pro-airbyte-webapp-svc
port:
number: 30080
path: /
pathType: Prefix
- backend:
service:
name: airbyte-pro-airbyte-keycloak-svc
port:
number: 30081
path: /auth
pathType: Prefix
You may configure ingress using a load balancer or an API Gateway. We do not currently support most service meshes (such as Istio). If you are having networking issues after fully deploying Airbyte, please verify that firewalls or lacking permissions are not interfering with pod-pod communication. Please also verify that deployed pods have the right permissions to make requests to your external database.
Install Airbyte Enterprise
Install Airbyte Enterprise on helm using the following command:
./tools/bin/install_airbyte_pro_on_helm.sh
The default release name is airbyte-pro
. You can change this via the RELEASE_NAME
environment
variable.
Customizing your Airbyte Enterprise Deployment
In order to customize your deployment, you need to create values.yaml
file in a local folder and populate it with default configuration override values. A values.yaml
example can be located in charts/airbyte folder of the Airbyte repository.
After specifying your own configuration, run the following command:
./tools/bin/install_airbyte_pro_on_helm.sh --values path/to/values.yaml