ADBC (Arrow Database Connectivity) is a database access API designed around Apache Arrow, enabling efficient columnar data interchange between applications and databases. Unlike traditional row-oriented interfaces, ADBC allows you to retrieve query results directly as Arrow tables, avoiding costly row-to-column conversions.
This notebook walks through connecting to DuckLake from Python using ADBC with the standard DBAPI interface. You'll learn how to execute queries, fetch results as Arrow tables, perform bulk ingestion of Arrow data, and inspect database metadata.
Setup¶
Install the required dependencies:
%pip install -q dbc adbc-driver-manager pyarrow
Note: you may need to restart the kernel to use updated packages.
Install the DuckDB ADBC driver:
!dbc install -q duckdb
Import PyArrow and the ADBC driver manager:
import pyarrow as pa
from adbc_driver_manager import dbapi
Connection and Cursor¶
Create a DBAPI-style connection with DuckDB (in-memory):
connection = dbapi.connect(driver="duckdb", autocommit=True)
Create a cursor:
cursor = connection.cursor()
Install the ducklake extension and create a new database:
cursor.execute("INSTALL ducklake")
cursor.execute("ATTACH 'ducklake:my_ducklake.ducklake' AS my_ducklake;")
cursor.execute("USE my_ducklake;")
<adbc_driver_manager.dbapi.Cursor at 0x10758de80>
Query Execution¶
Execute a SQL query and get the results via the normal, row-oriented DBAPI interface:
cursor.execute("SELECT 1 AS id, 'Alice' AS name")
cursor.fetchone()
(1, 'Alice')
Alternatively, get the results as a PyArrow Table:
cursor.execute("SELECT 1 AS id, 'Alice' AS name")
cursor.fetch_arrow_table()
pyarrow.Table id: int32 name: string ---- id: [[1]] name: [["Alice"]]
Parameters can be bound to queries:
cursor.execute("SELECT ? + 1 AS favorite_num", parameters=(10,))
cursor.fetch_arrow_table()
pyarrow.Table favorite_num: int64 ---- favorite_num: [[11]]
Bulk Ingestion¶
Ingest Arrow data into a database table:
table = pa.table({"id": [1, 2, 3, 4], "name": ["Ian", "Matt", "David", "Bryce"]})
cursor.adbc_ingest(table_name="users", data=table, mode="create")
4
Append to the database table:
table = pa.table({"id": [5, 6], "name": ["Mandy", "Sam"]})
cursor.adbc_ingest(table_name="users", data=table, mode="append")
2
Query the ingested data:
cursor.execute("SELECT * FROM users")
cursor.fetchall()
[(1, 'Ian'), (2, 'Matt'), (3, 'David'), (4, 'Bryce'), (5, 'Mandy'), (6, 'Sam')]
Metadata¶
Get information about the database and driver:
connection.adbc_get_info()
{'vendor_name': 'duckdb',
'vendor_version': 'v1.5.1',
'driver_name': 'ADBC DuckDB Driver',
'driver_version': 'v1.5.1',
'driver_arrow_version': '(unknown)',
'driver_adbc_version': 1001000}
Query for tables and columns in the database:
info = (
connection.adbc_get_objects(catalog_filter="my_ducklake", table_name_filter="users")
.read_all()
.to_pylist()
)
catalog = info[0]
schema = catalog["catalog_db_schemas"][0]
tables = schema["db_schema_tables"]
tables[0]["table_name"]
'users'
[column["column_name"] for column in tables[0]["table_columns"]]
['id', 'name']
Get the Arrow schema of a table:
connection.adbc_get_table_schema("users")
id: int64 name: string
Cleanup¶
Detach from DuckLake:
cursor.execute("USE memory;")
cursor.execute("DETACH my_ducklake;")
<adbc_driver_manager.dbapi.Cursor at 0x10758de80>
Close the cursor and connection:
cursor.close()
connection.close()
Optionally, delete the DuckLake files:
import os
import shutil
os.remove("my_ducklake.ducklake")
shutil.rmtree("my_ducklake.ducklake.files")