Databricks Foundation Model APIs - Azure Databricks (2024)

  • Article

This article provides an overview of the Foundation Model APIs in Azure Databricks. It includes requirements for use, supported models, and limitations.

What are Databricks Foundation Model APIs?

Mosaic AI Model Serving now supports Foundation Model APIs which allow you to access and query state-of-the-art open models from a serving endpoint. With Foundation Model APIs, you can quickly and easily build applications that leverage a high-quality generative AI model without maintaining your own model deployment.

The Foundation Model APIs are provided in two pricing modes:

  • Pay-per-token: This is the easiest way to start accessing foundation models on Databricks and is recommended for beginning your journey with Foundation Model APIs. This mode is not designed for high-throughput applications or performant production workloads.
  • Provisioned throughput: This mode is recommended for all production workloads, especially those that require high throughput, performance guarantees, fine-tuned models, or have additional security requirements. Provisioned throughput endpoints are available with compliance certifications like HIPAA.

See Use Foundation Model APIs for guidance on how to use these two modes and the supported models.

Using the Foundation Model APIs you can:

  • Query a generalized LLM to verify a project’s validity before investing more resources.
  • Query a generalized LLM in order to create a quick proof-of-concept for an LLM-based application before investing in training and deploying a custom model.
  • Use a foundation model, along with a vector database, to build a chatbot using retrieval augmented generation (RAG).
  • Replace proprietary models with open alternatives to optimize for cost and performance.
  • Efficiently compare LLMs to see which is the best candidate for your use case, or swap a production model with a better performing one.
  • Build an LLM application for development or production on top of a scalable, SLA-backed LLM serving solution that can support your production traffic spikes.

Requirements

  • Databricks API token to authenticate endpoint requests.
  • Serverless compute (for provisioned throughput models).
  • A workspace in a supported region:
    • Pay-per-token regions.
    • Provisioned throughput regions.

Note

For provisioned throughput workloads that use the DBRX Base model, see Foundation Model APIs limits for region availability.

Use Foundation Model APIs

You have multiple options for using the Foundation Model APIs.

The APIs are compatible with OpenAI, so you can use the OpenAI client for querying. You can also use the UI, the Foundation Models APIs Python SDK, the MLflow Deployments SDK, or the REST API for querying supported models. Databricks recommends using the OpenAI client SDK or API for extended interactions and the UI for trying out the feature.

See Query foundation models and external models for scoring examples.

Pay-per-token Foundation Model APIs

Pay-per-tokens models are accessible in your Azure Databricks workspace, and are recommended for getting started. To access them in your workspace, navigate to the Serving tab in the left sidebar. The Foundation Model APIs are located at the top of the Endpoints list view.

Databricks Foundation Model APIs - Azure Databricks (1)

The following table summarizes the supported models for pay-per-token. See Supported models for pay-per-token for additional model information.

If you want to test out and chat with these models you can do so using the AI Playground. See Chat with LLMs and prototype GenAI apps using AI Playground.

Important

  • Starting July 23, 2024, Meta-Llama-3.1-70B-Instruct replaces support for Meta-Llama-3-70B-Instruct in Foundation Model APIs pay-per-token endpoints.
  • Meta-Llama-3.1-405B-Instruct is the largest openly available state-of-the-art large language model, built and trained by Meta and distributed by Azure Machine Learning using the AzureML Model Catalog.
  • The Llama 2 70B chat model is planned for retirement. After October 30, 2024, this model will no longer be supported.
  • The MPT 7B Instruct and MPT 30B Instruct models are now retired. See Retired models for recommended replacement models.
ModelTask typeEndpointNotes
GTE Large (English)Embeddingdatabricks-gte-large-en
Meta-Llama-3.1-70B-InstructChatdatabricks-meta-llama-3-1-70b-instruct
Meta-Llama-2-70B-ChatChatdatabricks-llama-2-70b-chatSee Foundation Model APIs limits for region availability.
Meta-Llama-3.1-405B-Instruct*Chatdatabricks-meta-llama-3-1-405b-instructSee Foundation Model APIs limits for region availability.
DBRX InstructChatdatabricks-dbrx-instructSee Foundation Model APIs limits for region availability.
Mixtral-8x7B InstructChatdatabricks-mixtral-8x7b-instructSee Foundation Model APIs limits for region availability.
BGE Large (English)Embeddingdatabricks-bge-large-enSee Foundation Model APIs limits for region availability.

* Reach out to your Databricks account team if you encounter endpoint failures or stabilization errors when using this model.

  • See Query foundation models and external models for guidance on how to query Foundation Model APIs.
  • See Foundation model REST API reference for required parameters and syntax.

Provisioned throughput Foundation Model APIs

Provisioned throughput provides endpoints with optimized inference for foundation model workloads that require performance guarantees. Databricks recommends provisioned throughput for production workloads. See Provisioned throughput Foundation Model APIs for a step-by-step guide on how to deploy Foundation Model APIs in provisioned throughout mode.

Provisioned throughput support includes:

  • Base models of all sizes, such as DBRX Base. Base models can be accessed using the Databricks Marketplace, or you can alternatively download them from Hugging Face or another external source and register them in the Unity Catalog. The latter approach works with any fine-tuned variant of the supported models, irrespective of the fine-tuning method employed.
  • Fine-tuned variants of base models, such as LlamaGuard-7B. This includes models that are fine-tuned on proprietary data.
  • Fully custom weights and tokenizers, such as those trained from scratch or continued pre-trained or other variations using the base model architecture (such as CodeLlama, Yi-34B-Chat, or SOLAR-10.7B).

The following table summarizes the supported model architectures for provisioned throughput.

Important

Meta Llama 3.2 is licensed under the LLAMA 3.2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved. Customers are responsible for ensuring their compliance with the terms of this license and the Llama 3.2 Acceptable Use Policy.

Meta Llama 3.1 are licensed under the LLAMA 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved. Customers are responsible for ensuring compliance with applicable model licenses.

Model architectureTask typesNotes
Meta Llama 3.2 3BChat or Completion
Meta Llama 3.2 1BChat or Completion
Meta Llama 3.1Chat or Completion
Meta Llama 3Chat or Completion
Meta Llama 2Chat or Completion
DBRXChat or CompletionSee Foundation Model APIs limits for region availability.
MistralChat or Completion
MixtralChat or Completion
MPTChat or Completion
GTE v1.5 (English)Embedding
BGE v1.5 (English)Embedding

Limitations

See Model Serving limits and regions.

Additional resources

  • Query foundation models and external models
  • Provisioned throughput Foundation Model APIs
  • Batch inference using Foundation Model API provisioned throughput
  • Supported models for pay-per-token
  • Foundation model REST API reference
Databricks Foundation Model APIs - Azure Databricks (2024)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Msgr. Benton Quitzon

Last Updated:

Views: 6253

Rating: 4.2 / 5 (63 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Msgr. Benton Quitzon

Birthday: 2001-08-13

Address: 96487 Kris Cliff, Teresiafurt, WI 95201

Phone: +9418513585781

Job: Senior Designer

Hobby: Calligraphy, Rowing, Vacation, Geocaching, Web surfing, Electronics, Electronics

Introduction: My name is Msgr. Benton Quitzon, I am a comfortable, charming, thankful, happy, adventurous, handsome, precious person who loves writing and wants to share my knowledge and understanding with you.