Automating Postgres and pgvector Setup with Docker
When it comes to managing databases in development environments, Docker is a lifesaver for many developers due to its simplicity and isolation capabilities. We’ll delve into how to set up a Docker container that not only runs a PostgreSQL database but also incorporates the pgvector extension, which is crucial for performing vector operations in the database efficiently.
Why Use pgvector with PostgreSQL?
The pgvector extension is designed to support vector operations directly within PostgreSQL, facilitating machine learning and other applications that require vector computations. Integrating pgvector in a PostgreSQL setup can drastically simplify the development of applications that rely on high-dimensional vector arithmetic.
Setting Up the Environment
Our approach leverages Docker to create a contained environment that requires minimal configuration and can be easily replicated across different machines. Here’s how you can do it step by step.
1. Docker Compose Configuration
First, you’ll need to create a docker-compose.yml
file. This file will define the PostgreSQL service and include configurations necessary for the pgvector extension.
version: "3.9"
services:
pgvector-db:
env_file:
- ./postgres-pgvector/.env
build:
dockerfile: postgres.Dockerfile
container_name: postgres-pgvector
ports:
- "5454:5432"
volumes:
- db_data:/var/lib/postgresql/data
- ./postgres/vector_extension.sql:/docker-entrypoint-initdb.d/0-vector_extension.sql
networks:
- default
volumes:
db_data:
2. Dockerfile for PostgreSQL and pgvector
Next, create a postgres.Dockerfile
. This Dockerfile will include the instructions to install PostgreSQL and the pgvector extension. This Dockerfile pulls the latest PostgreSQL image, installs necessary packages to build pgvector, clones the pgvector repository, builds it, and installs it into the PostgreSQL server.
# Extend the official PostgreSQL 14.1 image
FROM postgres:14.1
# Install necessary dependencies for building pgvector
RUN apt-get update && apt-get install -y \
build-essential \
postgresql-server-dev-14 \
git \
clang-11 \
llvm-11 \
ca-certificates \
&& update-ca-certificates \
&& rm -rf /var/lib/apt/lists/*
# Set clang as the default compiler
ENV CC=clang-11
ENV CXX=clang++-11
# Clone and build pgvector
WORKDIR /tmp
RUN git clone https://github.com/pgvector/pgvector.git
WORKDIR /tmp/pgvector
RUN make
RUN make install
# Enable pgvector in PostgreSQL
RUN echo "shared_preload_libraries = 'pgvector'" >> /usr/share/postgresql/postgresql.conf.sample
3. Initializing the Vector Extension
To initialize the pgvector extension, you’ll need a SQL script. In the docker-compose.yml
, this is mounted to /docker-entrypoint-initdb.d
, which means it will execute when the container starts for the first time.
-- Create the 'vector' extension within the database that is set in the docker-compose.yml
CREATE EXTENSION IF NOT EXISTS vector;
The reason we can run this without creating a database or connection details is it’s already done by the base image using the details from the docker-compose.yml
file.
And we’re done, connect with your preferred sql client using the details specified in the docker-compose file.
Conclusion
Using Docker to set up PostgreSQL with the pgvector extension provides a seamless setup process that is both easy to replicate and isolate from your local development environment. This approach not only ensures consistency across different development environments but also minimizes potential conflicts and issues that may arise during the development process.
By integrating pgvector, developers can harness the power of vector operations directly within their databases, paving the way for more robust and scalable applications, particularly those that leverage machine learning technologies.