Apache Polaris Guides

You can quickly get started with Polaris by playing with the docker-compose examples provided by the guides shown on the left of this page. Each guide has detailed instructions.

Prerequisites🔗

Getting Started Examples🔗

  • Spark: An example that uses an in-memory metastore, automatically bootstrapped, with Apache Spark and a Jupyter notebook.

  • Telemetry: An example that includes Prometheus and Jaeger to collect metrics and traces from Apache Polaris. This example automatically creates a polaris_demo catalog.

  • Keycloak: An example that uses Keycloak as an external identity provider (IDP) for authentication.

Authoring Guides🔗

Writing new Guides or updating existing ones, especially those that use Docker Compose, should follow these guidelines. This ensures that the examples work as expected and that the guides pass the tests run in CI and locally.

Running guides CI tests locally🔗

Requirements: Python 3.14 installed locally. Other Python 3 versions should work as well, but have not been validated.

To run tests for all guides in site/content/guides, run

1cd site/it
2./markdown-testing.py

To run tests for one or more guides in site/content/guides, use the following pattern:

1cd site/it
2./markdown-testing.py content/guides/rustfs/index.md content/guides/telemetry/index.md

More information about guides-testing can be found in the site/it/README.md file in the source repository.

Constraints🔗

  1. docker compose invocations must be on a single line in a shell code block.
  2. When invoking Spark SQL shell in a shell code block, the bin/spark-sql ... invocation must be the only statement in that code block.
  3. Currently, only one docker-compose file is supported per guide. Using multiple docker-compose files needs some changes to the testing code, see here.

Tips for Docker usage🔗

  • Do not use any of the --interactive or --tty options

Tips for Docker Compose files🔗

  • All “daemon” services should have a healthcheck and depend on it using condition: service_healthy.

  • Always put port-mappings in quotes, for example:

    1ports:
    2  - "9874:9874"
    
  • “Final setup services,” for example the service that sets up Polaris, are a bit tricky to get right in the sense that those do not let docker compose up --detach --wait fail. The former docker-compose command would yield an error if any service exits. To prevent this, setup-services should have a final “endless” command to keep them running and have a health check to inform about its success.

    “Intermediate setup services,” for example the service to create a bucket, do not and should not have a final sleep command, because that would delay the overall startup. The Polaris service must depend on the intermediate service to setup a bucket with the condition service_completed_successfully.

    Working example:

     1polaris-setup:
     2  image: alpine/curl:8.17.0
     3  depends_on:
     4    polaris:
     5      condition: service_healthy
     6  (...)
     7  entrypoint: "/bin/sh"
     8  command:
     9    - "-c"
    10    - >-
    11      do_stuff &&
    12      touch /tmp/polaris-setup-done &&
    13      tail -f /dev/null
    14  healthcheck:
    15    test: ["CMD", "test", "-f", "/tmp/polaris-setup-done"]
    
  • Take care to propagate all failures of a setup service’s commands. This can be achieved by chaining the commands with && and using set -e at the beginning of executed scripts.

  • For curl, use the --fail-with-body or --fail options or add specialized logic to validate the HTTP response code.

  • Service dependencies (depends_on) must be declared with the correct condition, meaning to use the “Long syntax” as described in the Docker Compose documentation.

  • Dependencies on “setup” services MUST be declared using a syntax like this:

    1polaris:
    2  image: apache/polaris:latest
    3  depends_on:
    4    setup_bucket:
    5      condition: service_completed_successfully
    
  • Dependencies on “daemon” services with a “proper” health check should be declared using a syntax like this:

    1polaris:
    2  image: apache/polaris:latest
    3  depends_on:
    4    minio:
    5      condition: service_healthy
    

Tips for inspecting Docker compose logs🔗

When inspecting the logs of Docker Compose, watch the logged timestamps. The order of the log entries emitted by Docker Compose is not necessarily in the natural order of execution between different services. Timestamps of each log entry are emitted to the console and the test log files.