OCI Generative AI On-Demand Models – From Setup to Chat App

Generative AI is transforming how organizations build intelligent applications, from interactive assistants to automated knowledge systems. Oracle Cloud Infrastructure (OCI) makes this power accessible through its Generative AI On-Demand Models, including options like Cohere’s Command R+ and Meta’s Llama 3.3.

On-Demand Model are economical:



In this guide, we’ll take you through the complete journey — starting with configuring access and locating the right model OCID, and ending with a fully functional chat application built using the OCI Python SDK and Streamlit. By the end, you’ll know exactly how to move from setup to implementation and bring Generative AI into your own applications.


Getting Started

Before writing any code, you must configure OCI credentials to allow your application to call Generative AI services.

1. Generate an API Key

  1. Log in to the OCI Console.

  2. Click your profile icon → User Settings.

  3. Under Resources, select API Keys.

  4. Click Add API Key and either:

    • Generate a new key pair in OCI (download the private key .pem), or

    • Upload your own public key (if you already created one with openssl).

After adding the key, OCI will show you:

  • User OCID

  • Tenancy OCID

  • Fingerprint

  • tenancy

  • region

Copy these values.

2. Save the Private Key

If OCI generated the key, download the .pem file and place it under ~/.oci/oci_api_key.pem.
Restrict access:

chmod 600 ~/.oci/oci_api_key.pem

3. Update the OCI Config File

Create or edit ~/.oci/config and add a section, for example:

[DEFAULT] user=ocid1.user.oc1..aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa fingerprint=60:11:15:19:15:11:11:11:11:11:11:11:11:11:11:11 tenancy=ocid1.tenancy.oc1..aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa region=us-chicago-1 key_file=/root/.oci/oci_api_key.pem

4. Fix File Permissions (if needed)

If you see warnings like:

Permissions on ~/.oci/config are too open

fix with:

oci setup repair-file-permissions --file ~/.oci/config oci setup repair-file-permissions --file ~/.oci/oci_api_key.pem

Find a Model OCID

Each Generative AI model (e.g. Cohere Command R+, Meta Llama 3.3) has a unique OCID. You’ll need this OCID in your application.

Run the following command to list available models in your region:

oci generative-ai model-collection list-models \ --compartment-id <your_compartment_ocid> \ --region us-chicago-1

Sample output :
{ "data": { "items": [ { "base-model-id": null, "capabilities": [ "UNKNOWN_ENUM_VALUE" ], "compartment-id": null, "defined-tags": {}, "display-name": "meta.llama-guard-4-12b", "fine-tune-details": null, "freeform-tags": {}, "id": "ocid1.generativeaimodel.oc1.us-chicago-1.amaaaaaask7dceyaf4q5ji7iw7k3h6ol4a6lgpk7jnjuc5xlq55z4kxyfecq", "is-long-term-supported": true, "lifecycle-details": "Creating Base Model", "lifecycle-state": "ACTIVE", "model-metrics": null, "system-tags": {}, "time-created": "2025-08-19T19:44:09.634000+00:00", "time-dedicated-retired": null, "time-deprecated": "2025-08-01T00:00:00+00:00", "time-on-demand-retired": null, "type": "BASE", "vendor": "meta", "version": "1.0.0" }, { "base-model-id": null, "capabilities": [ "CHAT" ], "compartment-id": null, "defined-tags": {}, "display-name": "xai.grok-4", "fine-tune-details": null, "freeform-tags": {}, "id": "ocid1.generativeaimodel.oc1.us-chicago-1.amaaaaaask7dceya3bsfz4ogiuv3yc7gcnlry7gi3zzx6tnikg6jltqszm2q", "is-long-term-supported": true, "lifecycle-details": "Base Model created", "lifecycle-state": "ACTIVE", "model-metrics": null, "system-tags": {}, "time-created": "2025-07-22T02:38:53.272000+00:00", "time-dedicated-retired": null, "time-deprecated": null, "time-on-demand-retired": null, "type": "BASE", "vendor": "xai", "version": "1.0.0" }, { "base-model-id": null, "capabilities": [ "CHAT" ], "compartment-id": null, "defined-tags": {}, "display-name": "cohere.command-latest", "fine-tune-details": null, "freeform-tags": {}, "id": "ocid1.generativeaimodel.oc1.us-chicago-1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "is-long-term-supported": true, "lifecycle-details": "Base Model created", "lifecycle-state": "ACTIVE", "model-metrics": null, "system-tags": {}, "time-created": "2025-07-02T17:50:29.440000+00:00", "time-dedicated-retired": null, "time-deprecated": null, "time-on-demand-retired": null, "type": "BASE", "vendor": "cohere", "version": "1.7" }, { "base-model-id": null, "capabilities": [ "CHAT" ], "compartment-id": null, "defined-tags": {}, "display-name": "cohere.command-plus-latest", "fine-tune-details": null, "freeform-tags": {}, "id": "ocid1.generativeaimodel.oc1.us-chicago-1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "is-long-term-supported": true, "lifecycle-details": "Base Model created", "lifecycle-state": "ACTIVE", "model-metrics": null, "system-tags": {}, "time-created": "2025-07-02T17:50:29.414000+00:00", "time-dedicated-retired": null, "time-deprecated": null, "time-on-demand-retired": null, "type": "BASE", "vendor": "cohere", "version": "1.6" }

From the output, copy the id value for the model you want. For example:

  • Cohere Command R+

    "display-name": "cohere.command-r-plus-08-2024", "id": "ocid1.generativeaimodel.oc1.us-chicago-1.amaaaaaask7dceyaodm6rdyxmdzlddweh4amobzoo4fatlao2pwnekexmosq"
  • Meta Llama 3.3 70B Instruct

    "display-name": "meta.llama-3.3-70b-instruct", "id": "ocid1.generativeaimodel.oc1.us-chicago-1.amaaaaaask7dceyazz5xnau6rie75wc2imyk4z54b6rg3z6rpbdlhox4cm7a"

You’ll use this OCID in the Python app.


Try the OCI Playground and get sample Code


Before writing your own application, it’s a good idea to experiment with the OCI Generative AI Playground:

  1. Go to the OCI Console → Analytics & AI → Generative AI -> Playground.

  2. Select a model (e.g., Cohere Command R+ or Meta Llama 3.3).

  3. Enter a sample prompt and test your use case (chat, Q&A, summarization, etc.).

  4. Adjust parameters such as Max Tokens, Temperature, Top P, and Top K.

  5. Once you’re satisfied with the results, click View Code


The Playground lets you download code in:

Languages : Java,Python,TypeScript

Frameworks : Python-LangChain, Python-LlamaIndex


🐍 Application Code

Here’s the Streamlit-based sample chat app (chat_app.py):

(change the compartment_id, endpoint URL – update it to your region – and model_id in the code below)


https://github.com/narasimharaok-cloud9/Reusable_GenerativeAI/blob/main/chat_app.py


import streamlit as st

import oci # --------------------------- # OCI Config Setup # --------------------------- compartment_id = "ocid1.tenancy.oc1..aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" CONFIG_PROFILE = "DEFAULT" config = oci.config.from_file("~/.oci/config", CONFIG_PROFILE) endpoint = "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com" generative_ai_inference_client = oci.generative_ai_inference.GenerativeAiInferenceClient( config=config, service_endpoint=endpoint, retry_strategy=oci.retry.NoneRetryStrategy(), timeout=(10, 240), ) # --------------------------- # Streamlit UI Setup # --------------------------- st.set_page_config(page_title="Oracle Generative AI Chat", page_icon="💬") st.title("💬 Oracle Generative AI - Chat") if "messages" not in st.session_state: st.session_state["messages"] = [] for msg in st.session_state["messages"]: with st.chat_message(msg["role"]): st.markdown(msg["content"]) if prompt := st.chat_input("Type your message here..."): st.session_state["messages"].append({"role": "user", "content": prompt}) with st.chat_message("user"): st.markdown(prompt) with st.chat_message("assistant"): with st.spinner("Thinking..."): chat_request = oci.generative_ai_inference.models.CohereChatRequest() chat_request.message = prompt chat_request.max_tokens = 800 chat_request.temperature = 0.7 chat_request.frequency_penalty = 1 chat_request.top_p = 0.75 chat_request.top_k = 0 chat_detail = oci.generative_ai_inference.models.ChatDetails() chat_detail.serving_mode = oci.generative_ai_inference.models.OnDemandServingMode( model_id="ocid1.generativeaimodel.oc1.us-chicago-1.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" # Cohere Command R+ (replace with your choice) ) chat_detail.chat_request = chat_request chat_detail.compartment_id = compartment_id response = generative_ai_inference_client.chat(chat_detail) reply = response.data.chat_response.text st.markdown(reply) st.session_state["messages"].append({"role": "assistant", "content": reply})


Running the Application

  1. Install dependencies:

    pip install oci streamlit
  2. Run the app:

    streamlit run chat_app.py
  3. Open http://localhost:8501 or replace it with required port to interact with your chatbot.


Conclusion

With OCI Generative AI’s on-demand models, developers can quickly prototype enterprise-ready AI assistants without managing infrastructure.

By combining the OCI Python SDK with Streamlit, you get an interactive chat UI that supports multiple turns, conversation history, and flexible model selection.


Author: Narasimharao Karanam

:


Comments