Apache Polaris Guides
You can quickly get started with Polaris by playing with the docker-compose examples provided by the guides shown on the left of this page. Each guide has detailed instructions.
Prerequisites🔗
- Docker
- Docker Compose
- jq (for some examples)
Getting Started Examples🔗
Spark: An example that uses an in-memory metastore, automatically bootstrapped, with Apache Spark and a Jupyter notebook.
Telemetry: An example that includes Prometheus and Jaeger to collect metrics and traces from Apache Polaris. This example automatically creates a
polaris_democatalog.Keycloak: An example that uses Keycloak as an external identity provider (IDP) for authentication.
Authoring Guides🔗
Writing new Guides or updating existing ones, especially those that use Docker Compose, should follow these guidelines. This ensures that the examples work as expected and that the guides pass the tests run in CI and locally.
Running guides CI tests locally🔗
Requirements: Python 3.14 installed locally. Other Python 3 versions should work as well, but have not been validated.
To run tests for all guides in site/content/guides, run
1cd site/it
2./markdown-testing.py
To run tests for one or more guides in site/content/guides, use the following pattern:
1cd site/it
2./markdown-testing.py content/guides/rustfs/index.md content/guides/telemetry/index.md
More information about guides-testing can be found in the site/it/README.md file in the
source repository.
Constraints🔗
docker composeinvocations must be on a single line in ashellcode block.- When invoking Spark SQL shell in a
shellcode block, thebin/spark-sql ...invocation must be the only statement in that code block. - Currently, only one docker-compose file is supported per guide. Using multiple docker-compose files needs some changes to the testing code, see here.
Tips for Docker usage🔗
- Do not use any of the
--interactiveor--ttyoptions
Tips for Docker Compose files🔗
All “daemon” services should have a healthcheck and depend on it using
condition: service_healthy.Always put port-mappings in quotes, for example:
1ports: 2 - "9874:9874"“Final setup services,” for example the service that sets up Polaris, are a bit tricky to get right in the sense that those do not let
docker compose up --detach --waitfail. The former docker-compose command would yield an error if any service exits. To prevent this, setup-services should have a final “endless” command to keep them running and have a health check to inform about its success.“Intermediate setup services,” for example the service to create a bucket, do not and should not have a final
sleepcommand, because that would delay the overall startup. The Polaris service must depend on the intermediate service to setup a bucket with theconditionservice_completed_successfully.Working example:
1polaris-setup: 2 image: alpine/curl:8.17.0 3 depends_on: 4 polaris: 5 condition: service_healthy 6 (...) 7 entrypoint: "/bin/sh" 8 command: 9 - "-c" 10 - >- 11 do_stuff && 12 touch /tmp/polaris-setup-done && 13 tail -f /dev/null 14 healthcheck: 15 test: ["CMD", "test", "-f", "/tmp/polaris-setup-done"]Take care to propagate all failures of a setup service’s commands. This can be achieved by chaining the commands with
&&and usingset -eat the beginning of executed scripts.For
curl, use the--fail-with-bodyor--failoptions or add specialized logic to validate the HTTP response code.Service dependencies (
depends_on) must be declared with the correctcondition, meaning to use the “Long syntax” as described in the Docker Compose documentation.Dependencies on “setup” services MUST be declared using a syntax like this:
1polaris: 2 image: apache/polaris:latest 3 depends_on: 4 setup_bucket: 5 condition: service_completed_successfullyDependencies on “daemon” services with a “proper” health check should be declared using a syntax like this:
1polaris: 2 image: apache/polaris:latest 3 depends_on: 4 minio: 5 condition: service_healthy
Tips for inspecting Docker compose logs🔗
When inspecting the logs of Docker Compose, watch the logged timestamps. The order of the log entries emitted by Docker Compose is not necessarily in the natural order of execution between different services. Timestamps of each log entry are emitted to the console and the test log files.