Import Your Own Models into OCI Generative AI - A Major Leap Forward for Enterprise AI
Oracle Cloud Infrastructure (OCI) continues to accelerate innovation in the AI space. With the latest release - “Import Your Own Models into OCI Generative AI” — enterprises and developers now have unprecedented flexibility to bring their preferred open-source or third-party AI models directly into OCI Generative AI and operationalize them at scale.
This capability makes OCI one of the few hyperscalers that allows you to import, host, and serve Hugging Face–format models, create fully managed endpoints, and run them on optimized AI clusters in the cloud.
If you’ve been waiting for a simple way to run your own LLMs - Qwen, Gemma, Llama, Phi, GPT-OSS, and many more—securely inside your OCI tenancy, this release is a game changer.
Let’s dive in.
Why This Release Matters
Generative AI adoption is expanding rapidly, but many organizations still prefer:
-
Model flexibility (use their preferred open-source models)
-
Data privacy and security (run models in their tenancy)
-
Performance tuning (choose compute shapes and optimize cost)
-
Vendor independence (avoid lock-in with a single model provider)
With OCI’s new import capability, customers can now:
-
Bring models from Hugging Face or OCI Object Storage
-
Create scalable, secure inference endpoints
-
Leverage high-performance AI clusters (A10, A100, H100, H200)
-
Use models in the OCI Generative AI Playground, API, or SDK
Whether you're building chatbots, enterprise copilots, search & retrieval systems, or embedding pipelines — this feature unlocks full control over your AI stack.
Supported Model Architectures
OCI Generative AI now supports a wide range of state-of-the-art model families:
๐น Chat Models
These enable conversational AI experiences. Supported architectures include:
-
Alibaba Qwen 2 & Qwen 3 — multilingual, multimodal capabilities
-
Google Gemma — lightweight yet powerful for broad language tasks
-
Meta Llama (Llama 2, 3, 3.1, 3.2, 3.3, 4) — industry-leading open LLMs
-
Microsoft Phi — efficient, compact, and cost-optimized
-
OpenAI GPT-OSS — open-weight MoE architecture with strong reasoning
๐น Embedding Models
-
Mistral — delivers high-performance embeddings for vector search, RAG, and semantic matching.
Prerequisites Before Importing a Model
1. Importing from Hugging Face
You need:
-
The model ID of any supported model
-
(If required) a Hugging Face access token with
readpermissions-
Needed for gated models like Llama 3 / Llama 3.1
-
2. Importing from Object Storage
Ensure:
-
IAM policy allowing access to Object Storage
-
Model files stored in Hugging Face format, including:
-
config.json(must be exactly this filename) -
tokenizer files
-
model weights
-
-
Model capability must be one of:
-
TEXT_TO_TEXT
-
IMAGE_TEXT_TO_TEXT
-
EMBEDDING
-
RERANK
-
Dedicated AI Cluster Requirements
OCI provides a variety of GPU cluster options depending on the model size.
Examples include:
| Cluster Unit | GPU Type | Units | AI Unit Count |
|---|---|---|---|
| A10_X1 | NVIDIA A10 | 1 | 1.77 |
| A100_80G_X4 | NVIDIA A100 80GB | 4 | 12.96 |
| H100_X8 | NVIDIA H100 | 8 | 48.08 |
| H200_X8 | NVIDIA H200 | 8 | 49.76 |
Pricing = AI Unit Count × Price per AI Unit Hour (shown on OCI pricing page).
Step-by-Step: How to Import and Deploy Your Model
This workflow makes the process simple and intuitive for teams new to model deployment.
Step 1: Import the Model
Choose one of two options:
Option A: Directly from Hugging Face
Provide:
-
Model name
-
Optional: HF token
OCI automatically fetches and validates the model files.
Option B: From OCI Object Storage
Upload your Hugging Face–format model to a bucket and initiate the import.
Step 2: Create a Hosting Dedicated AI Cluster
-
Select the compartment
-
Choose the model architecture
-
Pick the recommended cluster unit size
-
Acknowledge the compute-hour commitment
-
Deploy the cluster
Within minutes, the cluster becomes active.
Step 3: Create an Endpoint
Endpoints let you interact with the model securely.
Configure:
-
Compartment
-
Endpoint name
-
Model & version
-
Hosting cluster
-
Networking (public endpoint for imported models)
-
Tags (optional)
Once active, your endpoint is ready for use.
Step 4: Use the Model
You can now use your imported model via:
-
OCI Generative AI Playground
-
API calls
-
SDKs (Python, Java, OCI CLI, etc.)
This means instant integration with:
-
Chat apps
-
RAG systems
-
Enterprise copilots
-
Embedding pipelines
-
Backend services
Using the Model in the Playground
Once the endpoint is active:
-
Navigate to Endpoints
-
Select the endpoint
-
Click “View in Playground”
-
Start sending messages or prompts
Playground displays your model as: <model-name> (<endpoint-name>)
This helps teams test and compare models before deploying them into production.
Enterprise-Ready Controls
Imported models support:
-
Public endpoints
-
Monitoring
-
Logging
-
On-demand scaling
-
Fine-grained IAM security
-
Network isolation options
Note:
Guardrails (Content Moderation, PII Protection, Prompt Injection Protection) currently apply only to pretrained & custom models — not imported models.
Who Should Use This Feature?
This release is ideal for:
-
Enterprises building RAG or conversational AI
-
AI/ML teams wanting to host LLMs inside their own cloud boundary
-
Developers needing full control over model selection
-
Organizations migrating from Hugging Face, OpenAI, or on-prem LLM deployments
-
Teams optimizing cost using flexible GPU cluster options
The Future of Bring-Your-Own-Model on OCI
This release marks the beginning of a new era in how organizations deploy AI on OCI. Combined with OCI’s high-performance GPUs, low-cost network, and enterprise-grade security, customers can now:
-
Train → Fine-tune → Import → Host → Deploy → Integrate
all within a single cloud ecosystem.
OCI is quickly becoming a top choice for scalable, secure, and flexible enterprise generative AI deployment.
Conclusion
The Import Your Own Model capability in OCI Generative AI empowers businesses to bring the best of the open-source AI ecosystem into their secure cloud environment. It blends flexibility, performance, and cost efficiency — giving companies complete control over their AI strategy.
Whether you're developing an enterprise chatbot, powering a search engine with embeddings, or running multimodal use cases, OCI now gives you everything you need end-to-end.
Comments
Post a Comment