打造專屬ChatGPT:利用LLM進行結構化輸出(Structured Outputs)

A. 前言

〖打造專屬ChatGPT:OpenAI Chat Completion API參數解析〗中有提到,可以利用強大的LLM語言能力,輕鬆地從對話(非結構化資料)中提取出特定的關鍵資訊(結構化資料)。這對於我們在對話過程中需要掌握特定資訊時非常有用。

在OpenAI中,透過結構化輸出(Structured output)的功能,我們可以確保模型的回應符合事先定義的JSON Schema。當我們呼叫REST API時,可以將JSON Schema直接傳遞給OpenAI,使用SDK,當然也遵循相同的方法。

結構化輸出(Structured Outputs)支援的模型

  • ≥ gpt-4o-mini-2024-07-18
  • ≥ gpt-4o-2024-08-06

較舊的模型可嘗試使用JSON模式({"type": "json_object"})。

官方建議優先使用結構化輸出(Structured Outputs),經驗上來說,結構化輸出(Structured Outputs)在處理複雜的結構時,比JSON模式有更好的表現。

B. 結構化輸出(Structured Outputs)使用方式

使用OpenAI的結構化輸出時,需要遵循JSON Schema的規則,當我們在Chat Completion API中希望回傳資料符合特定格式時,我們可以透過response_format參數來指定輸出形式:

from openai import OpenAI
import json

client = OpenAI()

completion = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Extract the event information."},
        {
            "role": "user",
            "content": "Alice and Bob are going to a science fair on Friday.",
        },
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "CalendarEvent",
            "description": "取得活動名稱、時間、參與者",
            "schema": {
                "properties": {
                    "name": {"title": "Name", "type": "string"},
                    "date": {"title": "Date", "type": "string"},
                    "participants": {
                        "items": {"type": "string"},
                        "title": "Participants",
                        "type": "array",
                    },
                },
                "required": ["name", "date", "participants"],
                "title": "CalendarEvent",
                "type": "object",
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
)

event = json.loads(completion.choices[0].message.content)
print(event["name"], event["date"], event["participants"])

範例中我們將response_format設定為json_schema,基本結構及限制如下:

{
    "type" :"json_schema",
    "json_schema" {
        "name": Required,a-zA-Z0-9_-,最大長度64
        "description": 協助模型了解該如何使用這個結構
        "schema": *****JSON Schema寫在這裡*****
        "strict": boolean or null,是否嚴格遵循定義。設為True時,只支持部分JSON Schema。預設為False
    }
}

而我們的JSON Schema就會填入"schema"中,從前面的範例中我們可以看到光是三個欄位(name, date, participants)就要寫下許多內容。也因此OpenAI提供了額外支援,Python SDK是透過Python的pydantic模組來實現,程式碼如下:

from pydantic import BaseModel

from openai import OpenAI

client = OpenAI()


class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

    class Config:
        extra = "forbid"


completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Extract the event information."},
        {
            "role": "user",
            "content": "Alice and Bob are going to a science fair on Friday.",
        },
    ],
    response_format=CalendarEvent,
)

event = completion.choices[0].message.parsed
print(event.name, event.date, event.participants)

很明顯的程式碼變得簡潔,最後輸出欄位資訊(name, date, participants)時,也變成存取event物件的屬性。

這裡可以留意一下,以往我們呼叫模型時的方法都是client.chat.completions.create(...),但這裡是client.beta.chat.completions.parse(),目前還不是正式版。

如果我們想要保持client.chat.completions.create(...)來呼叫API,同時也避免非正式版的不穩定性,或是未來需要修改接口的可能,pydantic模組的BaseModel定義了類別方法(classmethod)model_json_schema,當我們的CalendarEvent繼承了BaseModel,透過這個方式CalendarEvent.model_json_schema()我們可以得到完整的JSON Schema,程式碼修改如下:

from openai import (
    OpenAI,
    APIConnectionError,
    APIResponseValidationError,
    APIStatusError,
    LengthFinishReasonError,
    ContentFilterFinishReasonError,
)

from pydantic import BaseModel, ValidationError
from typing import List


class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: List[str]

    class Config:
        extra = "forbid"


client = OpenAI()

try:
    completion = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Extract the event information."},
            {
                "role": "user",
                "content": "Alice and Bob are going to a science fair on Friday.",
            },
        ],
        response_format={
            "type": "json_schema",
            "json_schema": {
                "name": "CalendarEvent",
                "description": "取得活動名稱、時間、參與者",
                "schema": CalendarEvent.model_json_schema(),
                "strict": True,
            },
        },
    )
    print(completion.choices[0].message.content)

    if completion.choices[0].message.refusal:
        # 使用json_schema時,OpenAI可能會拒絕用戶的請求,從refusal欄位取出拒絕原因
        print(completion.choices[0].message.refusal)
    else:
        event: CalendarEvent = CalendarEvent.model_validate_json(
            completion.choices[0].message.content
        )
        print(event.name, event.date, event.participants)
except APIConnectionError as e:
    # Connection error or Request timed out.
    print(e.message)
except APIResponseValidationError as e:
    # OpenAI回傳資料無法驗證,Data returned by API invalid for expected schema.
    print(e.code, e.type, e.message)
    print(e.status_code)
except APIStatusError as e:
    # 4xx及5xx相關錯誤
    print(e.code, e.type, e.message)
    print(e.status_code)
    print(e.request_id)
except LengthFinishReasonError as e:
    # Could not parse response content as the length limit was reached.
    print(e)
except ContentFilterFinishReasonError as e:
    # Could not parse response content as the request was rejected by the content filter.
    print(e)
except ValidationError as e:
    # pydantic模組驗證模型輸出
    print(e.json())

夠過BaseModel的類別方法(classmethod)model_json_schema和model_validate_json,我們保持了程式碼的整潔並對輸出結果再次驗證,避免非預期結果,同時方便後續取值。

除了透過pydantic模組驗證模型輸出(ValidationError)以避免非預期結果外,在執行client.chat.completions.create時,也透過SDK定義的錯誤類別來捕捉錯誤,錯誤有以下類型:

  • APIConnectionError:Connection error or Request timed out.
  • APIResponseValidationError:OpenAI回傳資料無法驗證,Data returned by API invalid for expected schema.
  • APIStatusError:APIStatusError,4xx及5xx相關錯誤。
  • LengthFinishReasonError:Could not parse response content as the length limit was reached.
  • ContentFilterFinishReasonError:Could not parse response content as the request was rejected by the content filter.

不同錯誤可取得資訊不同,請參考範例程式碼。

在使用結構化輸出時,OpenAI可能會出於安全考量拒絕用戶的請求(Refusals with Structured Outputs)。此時,API會回傳refusal欄位,值的資料類型為字串,表示拒絕,透過completion.choices[0].message.refusal可以取得拒絕訊息。

C. 使用場景

1. Chain of thought(數學解題步驟)

from pydantic import BaseModel
from openai import OpenAI

class Step(BaseModel):
    explanation: str
    output: str

    class Config:
        extra = "forbid"


class MathReasoning(BaseModel):
    steps: list[Step]
    final_answer: str

    class Config:
        extra = "forbid"

client = OpenAI()
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful math tutor. Guide the user through the solution step by step.",
        },
        {"role": "user", "content": "how can I solve 8x + 7 = -23"},
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "MathReasoning",
            "description": "數學計算",
            "schema": MathReasoning.model_json_schema(),
            "strict": True,
        },
    },
)

math_reasoning: MathReasoning = MathReasoning.model_validate_json(
    completion.choices[0].message.content
)
for _ in math_reasoning.steps:
    print(_.explanation)
    print(_.output)

2. 非結構化資料萃取

from pydantic import BaseModel
from openai import OpenAI


class ResearchPaperExtraction(BaseModel):
    title: str
    authors: list[str]
    abstract: str
    keywords: list[str]

    class Config:
        extra = "forbid"


client = OpenAI()
completion = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    messages=[
        {
            "role": "system",
            "content": "You are an expert at structured data extraction. You will be given unstructured text from a research paper and should convert it into the given structure.",
        },
        {"role": "user", "content": "..."},
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "ResearchPaperExtraction",
            "schema": ResearchPaperExtraction.model_json_schema(),
            "strict": True,
        },
    },
)

research_paper = ResearchPaperExtraction.model_validate_json(
    completion.choices[0].message.content
)
print(research_paper)

3. 生成UI

from enum import Enum
from typing import List
from pydantic import BaseModel
from openai import OpenAI


class UIType(str, Enum):
    div = "div"
    button = "button"
    header = "header"
    section = "section"
    field = "field"
    form = "form"


class Attribute(BaseModel):
    name: str
    value: str

    class Config:
        extra = "forbid"


class UI(BaseModel):
    type: UIType
    label: str
    children: List["UI"]
    attributes: List[Attribute]

    class Config:
        extra = "forbid"


UI.model_rebuild()  # This is required to enable recursive types


class Response(BaseModel):
    ui: UI

    class Config:
        extra = "forbid"


client = OpenAI()
completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {
            "role": "system",
            "content": "You are a UI generator AI. Convert the user input into a UI.",
        },
        {"role": "user", "content": "Make a User Profile Form"},
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "Response",
            "description": "生成UI",
            "schema": Response.model_json_schema(),
            "strict": True,
        },
    },
)

ui = Response.model_validate_json(completion.choices[0].message.content)
print(ui)

4. Moderation

from enum import Enum
from typing import Optional
from pydantic import BaseModel
from openai import OpenAI


class Category(str, Enum):
    violence = "violence"
    sexual = "sexual"
    self_harm = "self_harm"


class ContentCompliance(BaseModel):
    is_violating: bool
    category: Optional[Category]
    explanation_if_violating: Optional[str]

    class Config:
        extra = "forbid"


client = OpenAI()
completion = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    messages=[
        {
            "role": "system",
            "content": "Determine if the user input violates specific guidelines and explain if they do.",
        },
        {"role": "user", "content": "How do I prepare for a job interview?"},
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "ContentCompliance",
            "description": "Content Compliance",
            "schema": ContentCompliance.model_json_schema(),
            "strict": True,
        },
    },
)

compliance = ContentCompliance.model_validate_json(
    completion.choices[0].message.content
)
print(compliance)

D. 注意事項

1. 確保輸入(messages)與定義的結構相關

如果輸入(messages)與定義的結構相關性太低,可能會產生虛構的內容。可以在提示中加入語句,指示模型在檢測到輸入與任務不相符時,返回空參數或指定句子。

2. 例外處理與提示工程

結構化輸出仍可能包含錯誤。如果發現錯誤,可以做以下調整:

  • 指示
  • 在系統指示中提供範例
  • 將任務拆分為更簡單的子任務

詳細調整方式可以參考OpenAI的提示工程指南

3. 確保JSON結構的一致性

為了防止JSON Schema與程式語言中的相應類型出現偏離,強烈建議使用原生的Pydantic(Python)或zod(Javascript) 套件。

如果偏好直接指定JSON Schema,可以添加CI規則,標記JSON架構或基礎數據對象被編輯的情況;亦可以新增一個CI步驟,從類型定義中自動生成JSON架構(反之亦然)。

4. 結構化輸出(Structure output)支援的JSON Schema

  • String
  • Number
  • Boolean
  • Integer
  • Object
  • Array
  • Enum
  • anyOf

5. 根物件(root object)不能使用anyOf

架構中的根物件(root object)必須是一個物件,不能使用 anyOf。

6. 所有的欄位都必須是required

在範例中,當我們操作CalendarEvent.model_json_schema()輸出JSON Schema時,required中已經包含所有欄位:

{
    'additionalProperties': False, 
    'properties': {
        'name': {'title': 'Name', 'type': 'string'}, 
        'date': {'title': 'Date', 'type': 'string'}, 
        'participants': {'items': {'type': 'string'}, 'title': 'Participants', 'type': 'array'}
    }, 
    'required': ['name', 'date', 'participants'], 
    'title': 'CalendarEvent', 
    'type': 'object'
}

如果不透過套件處理JSON Schema,需要額外注意。

如果需要表示一個欄位可以空時,Python中可以使用Union來實現:

from typing import Union

class CalendarEvent(BaseModel):
    name: Union[str, None]
    date: str
    participants: List[str]

    class Config:
        extra = "forbid"

7. 結構限制

物件屬性最多100個,巢狀層級最多可達5層。

8. Schema大小制

  • 在架構中,所有屬性名稱、定義名稱、枚舉(enum)和常量值的總字串長度不得超過15,000個token。
  • 所有枚舉(enum)只能有500個,超過250個則枚舉總長度不能超過7500個token。

9. additionalProperties無論何時皆為false

Python使用pydantic模組時,透過class Config設定。

10. Key的順序

使用結構化輸出(Structured output)時,輸出將按照架構中鍵的順序進行排列。

某些類型特定的關鍵字尚不支援:

  • 對於字串:minLengthmaxLengthpatternformat
  • 對於數字:minimummaximummultipleOf
  • 對於對象:patternPropertiesunevaluatedPropertiespropertyNamesminPropertiesmaxProperties
  • 對於數組:unevaluatedItemscontainsminContainsmaxContainsminItemsmaxItemsuniqueItems

設定"strict": True並使用不支援的JSON Schema,API會回傳錯誤(檢查refusal)。

E. Tool Calling中的JSON Schema

呼叫Chat Completion API時有一個tools參數,可以傳入相關工具,讓OpenAI判斷是否需要呼叫工具確保問題可以得到更精確的解答。其中定義呼叫工具所需參數時使用的也是JSON Schema,在直接使用Python Pydantic模組的部分支援仍在beta版本時,可以透過文章的方式,同樣使用Pydantic模組定義結構,再透過pydantic模組提供的類別方法(model_json_schema)來輸出JSON Schema。

Tool Calling的用法可以參考〖打造專屬ChatGPT:透過OpenAI Tool Calling深度整合資源〗這篇文章,當你了解到Tool Calling時,其實你會發現Tool Calling也能做到response_format={"type": "json_schema", ...}在做的事情,那個該如何選擇呢?官方文件的建議如下:

  • If you are connecting the model to tools, functions, data, etc. in your system, then you should use function calling.
  • If you want to structure the model’s output when it responds to the user, then you should use a structured.

簡單來說就是,萃取出資料後如果有需要呼叫其他工具(包含但不限於外部資料、內部資料庫…),就使用Tool Calling(原本叫做function calling),如果單純想要抓出特定的資訊就透過response_format,結合這兩種用法,可以讓程式變得聰明且靈活,更好地應對各種情境。

F. 總結

這篇文章討論了利用OpenAI的ChatGPT進行結構化輸出(Structured output)的方法,強調了如何將對話中的非結構化數據轉化為特定的結構化信息。

同時也介紹了應用於不同使用場景的示例,例如數學問題解答過程、非結構化數據處理及用戶界面生成等,並提供了實作原始碼和注意事項。

OpenAI的Python SDK雖然支援直接使用pydantic模組來撰寫JSON Schema,但直接的使用仍能在beta階段,目前正式版是直接撰寫JSON Schema,如果覺得這個方式太麻煩,也可以模仿文章中的用法,仍然透過pydantic模組處理JSON Schema,並藉由pydantic模組提供的類別方法來輸出或驗證模型回傳內容。