Output Parser - DONGLE

# Concept - LLM의 response을 변형할 때 사용된다. (ex, response를 str에서 list, dictionary로 바꾸기) - schema에 다양한 Output Parser가 존재한다.' # Parser Shape ### BaseOutputParser - `Output Parser`가 constructor인 Parser class로 만든 뒤, parse functoin을 만들어 원하는 값으로 데이터를 가공하는 것이 `BaseOutputParser` 기본적인 형태이다. - `from langchain.schema import BaseOutputParser`로 불러 올 수 있다. - 해당 parser의 올바른 parameter가 들어오도록 **template를 설정**해야 한다. - predict_messages의 response 값은 **content component** 에 있음을 주의하자. ```python from langchain.schema import BaseOutputParser class CommaOutputParser(BaseOutputParser): def parse(self, text): itmes = text.strip().split(",") return list(map(str.strip, itmes)) template = ChatPromptTemplate.from_messages( [ ("system","You are a list generating machine. Everything you are asked will be answered with a comma seperated list of max {max_items} in lowercase. Do NOT reply with anything else"), ("human", "{question}"), ] ) prompt = template.format_messages(max_items = 10, question = "What are the colors?") result = chat.predict_messages(prompt) p = CommaOutputParser() p.parse(result.content) ``` ### StrOutputParser - `Chain`의 결과값을 `String`으로 바꿔 출력해주는 `Output Parser`이다. - `from langchain.schema.output_parser import StrOutputParser`로 불러 올 수 있다. - `StrOutputParser`를 사용하면 `invoke()` 시 `.content` component를 따로 구하지 않아도 된다. ```python from langchain.schema.output_parser import StrOutputParser chain = prompt | llm | StrOutputParser() response = refine_chain.invoke({"content" : content, "question" : question}) print(response) ``` # StructuredOutputParser - **Structured Outputs**은 2024년 8월 6일에 공식 발표되었으며, JSON Schema를 사용하는 개발자 지정 구조를 모델이 완벽하게 따르도록 보장한다. - 이는 이전의 JSON mode보다 발전된 기능으로, 단순히 유효한 JSON을 출력하도록 하는 것을 넘어 **스키마 완전 준수**까지 보장한다. - 이를 사용하기 위해선 Python에서는 `Pydantic`, JavaScript는 `Zod`을 통해 **타입을 반드시 강제해야 한다.** ## Python 예제 ```python # pydantic 정의 from pydantic import BaseModel from typing import Literal class MovieReview(BaseModel): sentiment: Literal["positive", "neutral", "negative"] reason: str # `ChatOpenAI`에 구조화된 출력 설정 from langchain_openai import ChatOpenAI from langchain.output_parsers.openai_tools import PydanticToolsParser from langchain_core.output_parsers import JsonOutputKeyToolsParser from langchain_core.prompts import ChatPromptTemplate llm = ChatOpenAI( model="gpt-4o-2024-08-06", temperature=0, tool_choice="auto", # tool 사용 허용 tools=[MovieReview], # 구조화 출력용 스키마 전달 ) # 프롬프트 + 파서 조합 prompt = ChatPromptTemplate.from_template( "다음 문장을 감정 분석해줘: {text}" ) chain = prompt | llm | PydanticToolsParser(tools=[MovieReview]) # 결과 result = chain.invoke({"text": "영화는 좋았지만 결말이 별로였어."}) print(result) ``` ## JavaScript 예제 ### None Pipe ```javaScript import { z } from 'zod'; import { ChatOpenAI } from '@langchain/openai'; import { StructuredOutputParser } from 'langchain/output_parsers'; const schema = z.object({ title: z.string(), tags: z.array(z.string()) }); const parser = StructuredOutputParser.fromZodSchema(schema); const llm = new ChatOpenAI({ modelName: 'gpt-4o-2024-08-06', temperature: 0 }); const prompt = `제목: "인공지능의 미래" 이 제목에 대한 구조화된 정보: ${parser.getFormatInstructions()}` // 해당 Input을 통해 zod 형태의 결과를 출력하도록 유도 const result = await llm.invoke(prompt); const parsed = await parser.parse(result.content); // par console.log(parsed); ``` ### Pipe ```javaScript import { ChatOpenAI } from '@langchain/openai'; import { ChatPromptTemplate } from '@langchain/core/prompts'; import { StructuredOutputParser } from '@langchain/core/output_parsers'; import { z } from 'zod'; import { RunnableSequence } from '@langchain/core/runnables'; const openaiApiKey = import.meta.env.VITE_OPENAI_API_KEY; const schema = z.object({ tags: z.array(z.string()), }); const parser = StructuredOutputParser.fromZodSchema(schema); const model = new ChatOpenAI({ modelName: 'gpt-4o-mini', temperature: 0.7, apiKey: openaiApiKey, }); export async function getSuggestionTag(title: string) { const prompt = ChatPromptTemplate.fromTemplate( ` Your task is to infer and suggest 3 to 5 relevant tags based solely on the given article title. Do not just extract keywords. Instead, **analyze the title carefully**, and **infer the implied topic, context, and author’s intent**. Use your knowledge of common article patterns, technical domains, and writing purposes to make an educated guess. **Guidelines:** - Tags must be relevant, specific, and helpful for categorizing or searching the article. - Tags must be in lowercase and formatted as a list of strings in valid JSON. - Avoid vague or generic terms like "article" or "information". - Use your reasoning to go beyond what is explicitly written. - Output should be in Korean. Input Title: "{title}" {format_instructions} `.trim(), ); const chain = RunnableSequence.from([prompt, model, parser]); const response = await chain.invoke({ title, format_instructions: parser.getFormatInstructions(), }); return response; } ``` ## `JsonOutputParser` vs `StructeredOutputParser` | 비교 항목 | `JsonOutputParser` | `StructuredOutputParser` | | ---------------- | ------------------------- | ---------------------------------- | | **출력 구조 강제력** | ❌ 낮음 (LLM이 틀린 JSON 출력 가능) | ✅ 높음 (스키마를 따르도록 명시적으로 유도함) | | **스키마 기반 검증** | ❌ 수동 처리 필요 | ✅ `Zod` 또는 `Pydantic`으로 자동 검증 | | **JSON 오류 내성** | 중간 (단순 JSON 오류에 취약) | 높음 (스키마로 파싱 실패 시 에러 발생) | | **형 변환/검증 내장** | ❌ 없음 | ✅ 있음 | | **최신 모델 활용 최적화** | ❌ 일반 텍스트 출력 기반 | ✅ GPT-4o의 structured output 기능을 전제 | ## WithStructuredOutput - 모델이 처음부터 올바른 형식으로 출력하도록 지시하므로, 파싱 실패 가능성이 훨씬 낮은 방식이다. - `StructeredOutputParser`과 `JsonOutputParser`은 **모델**이 값을 생성하면 이를 **Parsing하는** 방식이라면, **withStructuredOutput**은 직접 모델이 구조화된 데이터를 생성하도록 한다. - **withStructuredOutput**은 **모델별**로 가장 적합한 방법을 자동으로 선택한다. - OpenAI 모델: JSON mode 또는 tool calling 사용 - Anthropic 모델: tool calling 사용 - 기타 모델: 각 모델의 최적화된 구조화 출력 방식 사용 ### JavaScript ```JavaScript import { ChatOpenAI } from '@langchain/openai'; import { ChatPromptTemplate } from '@langchain/core/prompts'; import { z } from 'zod'; const openaiApiKey = import.meta.env.VITE_OPENAI_API_KEY; const tagSchema = z.object({ tags: z.array(z.string()).describe('Array of suggested tags based on title analysis'), }); const model = new ChatOpenAI({ modelName: 'gpt-4o-mini', temperature: 0.7, apiKey: openaiApiKey, }); export async function getSuggestionTag(title: string) { const prompt = ChatPromptTemplate.fromTemplate( ` Your task is to infer and suggest 3 to 5 relevant tags based solely on the given article title. Do not just extract keywords. Instead, **analyze the title carefully**, and **infer the implied topic, context, and author’s intent**. Use your knowledge of common article patterns, technical domains, and writing purposes to make an educated guess. **Guidelines:** - Tags must be relevant, specific, and helpful for categorizing or searching the article. - Tags must be in lowercase and formatted as a list of strings in valid JSON. - Avoid vague or generic terms like "article" or "information". - Use your reasoning to go beyond what is explicitly written. - Output should be in Korean. Input Title: "{title}" `, ); const structuredModel = model.withStructuredOutput(tagSchema, { name: 'tag_suggestion', }); const response = await structuredModel.invoke(await prompt.format({ title })); return response; } ``` ### Python ```python from dotenv import load_dotenv from langchain_openai import ChatOpenAI from pydantic import BaseModel, Field from langchain_core.prompts import ChatPromptTemplate # 환경변수 로드 load_dotenv() llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7) # Pydantic 모델 정의 class TagRequest(BaseModel): tags: list[str] = Field(description="Array of suggested tags based on title analysis") # 프롬프트 정의 prompt = ChatPromptTemplate.from_template( """ Your task is to infer and suggest 3 to 5 relevant tags based solely on the given article title. Do not just extract keywords. Instead, **analyze the title carefully**, and **infer the implied topic, context, and author’s intent**. Use your knowledge of common article patterns, technical domains, and writing purposes to make an educated guess. **Guidelines:** - Tags must be relevant, specific, and helpful for categorizing or searching the article. - Tags must be in lowercase and formatted as a list of strings in valid JSON. - Avoid vague or generic terms like "article" or "information". - Use your reasoning to go beyond what is explicitly written. - Output should be in Korean. Input Title: "{title}" """, ) # 체인 정의 chain = prompt | llm.with_structured_output(TagRequest) # 태그 생성 함수 def generate_tag(text): return chain.invoke({"title": text}) ```