Import Your Own Models into OCI Generative AI - A Major Leap Forward for Enterprise AI

Oracle Cloud Infrastructure (OCI) continues to accelerate innovation in the AI space. With the latest release - “Import Your Own Models into OCI Generative AI” — enterprises and developers now have unprecedented flexibility to bring their preferred open-source or third-party AI models directly into OCI Generative AI and operationalize them at scale.

This capability makes OCI one of the few hyperscalers that allows you to import, host, and serve Hugging Face–format models, create fully managed endpoints, and run them on optimized AI clusters in the cloud.

If you’ve been waiting for a simple way to run your own LLMs - Qwen, Gemma, Llama, Phi, GPT-OSS, and many more—securely inside your OCI tenancy, this release is a game changer.

Let’s dive in.

Why This Release Matters

Generative AI adoption is expanding rapidly, but many organizations still prefer:

Model flexibility (use their preferred open-source models)
Data privacy and security (run models in their tenancy)
Performance tuning (choose compute shapes and optimize cost)
Vendor independence (avoid lock-in with a single model provider)

With OCI’s new import capability, customers can now:

Bring models from Hugging Face or OCI Object Storage
Create scalable, secure inference endpoints
Leverage high-performance AI clusters (A10, A100, H100, H200)
Use models in the OCI Generative AI Playground, API, or SDK

Whether you're building chatbots, enterprise copilots, search & retrieval systems, or embedding pipelines — this feature unlocks full control over your AI stack.

Supported Model Architectures

OCI Generative AI now supports a wide range of state-of-the-art model families:

🔹 Chat Models

These enable conversational AI experiences. Supported architectures include:

Alibaba Qwen 2 & Qwen 3 — multilingual, multimodal capabilities
Google Gemma — lightweight yet powerful for broad language tasks
Meta Llama (Llama 2, 3, 3.1, 3.2, 3.3, 4) — industry-leading open LLMs
Microsoft Phi — efficient, compact, and cost-optimized
OpenAI GPT-OSS — open-weight MoE architecture with strong reasoning

🔹 Embedding Models

Mistral — delivers high-performance embeddings for vector search, RAG, and semantic matching.

Prerequisites Before Importing a Model

1. Importing from Hugging Face

You need:

The model ID of any supported model
(If required) a Hugging Face access token with read permissions
- Needed for gated models like Llama 3 / Llama 3.1

2. Importing from Object Storage

Ensure:

IAM policy allowing access to Object Storage
Model files stored in Hugging Face format, including:
- config.json (must be exactly this filename)
- tokenizer files
- model weights
Model capability must be one of:
- TEXT_TO_TEXT
- IMAGE_TEXT_TO_TEXT
- EMBEDDING
- RERANK

Dedicated AI Cluster Requirements

OCI provides a variety of GPU cluster options depending on the model size.
Examples include:

Cluster Unit	GPU Type	Units	AI Unit Count
A10_X1	NVIDIA A10	1	1.77
A100_80G_X4	NVIDIA A100 80GB	4	12.96
H100_X8	NVIDIA H100	8	48.08
H200_X8	NVIDIA H200	8	49.76

Pricing = AI Unit Count × Price per AI Unit Hour (shown on OCI pricing page).

Step-by-Step: How to Import and Deploy Your Model

This workflow makes the process simple and intuitive for teams new to model deployment.

Step 1: Import the Model

Choose one of two options:

Option A: Directly from Hugging Face

Provide:

Model name
Optional: HF token

OCI automatically fetches and validates the model files.

Option B: From OCI Object Storage

Upload your Hugging Face–format model to a bucket and initiate the import.

Step 2: Create a Hosting Dedicated AI Cluster

Select the compartment
Choose the model architecture
Pick the recommended cluster unit size
Acknowledge the compute-hour commitment
Deploy the cluster

Within minutes, the cluster becomes active.

Step 3: Create an Endpoint

Endpoints let you interact with the model securely.

Configure:

Compartment
Endpoint name
Model & version
Hosting cluster
Networking (public endpoint for imported models)
Tags (optional)

Once active, your endpoint is ready for use.

Step 4: Use the Model

You can now use your imported model via:

OCI Generative AI Playground
API calls
SDKs (Python, Java, OCI CLI, etc.)

This means instant integration with:

Chat apps
RAG systems
Enterprise copilots
Embedding pipelines
Backend services

Using the Model in the Playground

Once the endpoint is active:

Navigate to Endpoints
Select the endpoint
Click “View in Playground”
Start sending messages or prompts

Playground displays your model as: <model-name> (<endpoint-name>)

This helps teams test and compare models before deploying them into production.

Enterprise-Ready Controls

Imported models support:

Public endpoints
Monitoring
Logging
On-demand scaling
Fine-grained IAM security
Network isolation options

Note:
Guardrails (Content Moderation, PII Protection, Prompt Injection Protection) currently apply only to pretrained & custom models — not imported models.

Who Should Use This Feature?

This release is ideal for:

Enterprises building RAG or conversational AI
AI/ML teams wanting to host LLMs inside their own cloud boundary
Developers needing full control over model selection
Organizations migrating from Hugging Face, OpenAI, or on-prem LLM deployments
Teams optimizing cost using flexible GPU cluster options

The Future of Bring-Your-Own-Model on OCI

This release marks the beginning of a new era in how organizations deploy AI on OCI. Combined with OCI’s high-performance GPUs, low-cost network, and enterprise-grade security, customers can now:

Train → Fine-tune → Import → Host → Deploy → Integrate
all within a single cloud ecosystem.

OCI is quickly becoming a top choice for scalable, secure, and flexible enterprise generative AI deployment.

Conclusion

The Import Your Own Model capability in OCI Generative AI empowers businesses to bring the best of the open-source AI ecosystem into their secure cloud environment. It blends flexibility, performance, and cost efficiency — giving companies complete control over their AI strategy.

Whether you're developing an enterprise chatbot, powering a search engine with embeddings, or running multimodal use cases, OCI now gives you everything you need end-to-end.

Search This Blog

Oracle Cloud Infrastructure - What's New