大模型应用开发教程（二）：主流大模型介绍与选择

前言

在上一章中，我们了解了大语言模型的基本概念和发展历程。本章将深入介绍当前主流的大语言模型，对比分析它们的特点、能力和适用场景，帮助你在实际项目开发中做出明智的技术选型决策。

主流大模型概览

当前大语言模型市场可以分为以下几类：

┌─────────────────────────────────────────────────────────┐
│                    大语言模型分类                        │
├─────────────────────────────────────────────────────────┤
│  商业闭源模型          │  开源模型            │  国内模型  │
│  ────────────          │  ────────            │  ────────  │
│  • OpenAI GPT 系列     │  • Meta LLaMA        │  • 通义千问 │
│  • Anthropic Claude    │  • Mistral AI        │  • 文心一言 │
│  • Google Gemini       │  • Qwen (开源版)     │  • 智谱 GLM │
│  • xAI Grok            │  • Yi                │  • 讯飞星火 │
└─────────────────────────────────────────────────────────┘

OpenAI GPT 系列

模型家族

OpenAI 的 GPT 系列是目前应用最广泛的大语言模型：

模型	参数规模	上下文长度	特点
GPT-4o	未公开	128K	多模态、速度快、成本低
GPT-4o mini	未公开	128K	轻量级、高性价比
GPT-4 Turbo	约1.8T	128K	高性能、视觉能力
GPT-4	约1.8T	8K/32K	基础版本
o1	未公开	200K	推理增强型
o1-mini	未公开	128K	推理增强、轻量级

核心优势

1. 多模态能力

from openai import OpenAI

client = OpenAI()

# 图像理解示例
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "这张图片里有什么？"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/image.jpg"
                    }
                }
            ]
        }
    ]
)
from openai import OpenAI

client = OpenAI()

# 图像理解示例
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "这张图片里有什么？"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/image.jpg"
                    }
                }
            ]
        }
    ]
)

2. Function Calling（函数调用）

# 定义工具函数
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "获取指定城市的天气信息",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "城市名称"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "北京今天天气怎么样？"}
    ],
    tools=tools,
    tool_choice="auto"
)

# 模型会返回工具调用请求
tool_calls = response.choices[0].message.tool_calls
# 需要执行函数并将结果返回给模型
# 定义工具函数
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "获取指定城市的天气信息",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "城市名称"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "北京今天天气怎么样？"}
    ],
    tools=tools,
    tool_choice="auto"
)

# 模型会返回工具调用请求
tool_calls = response.choices[0].message.tool_calls
# 需要执行函数并将结果返回给模型

3. JSON 模式

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "列出5个编程语言及其特点"}
    ],
    response_format={"type": "json_object"}
)
# 确保输出为有效的 JSON 格式
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "列出5个编程语言及其特点"}
    ],
    response_format={"type": "json_object"}
)
# 确保输出为有效的 JSON 格式

API 定价（2024年参考）

模型	输入价格	输出价格
GPT-4o	$2.50 / 1M tokens	$10.00 / 1M tokens
GPT-4o mini	$0.15 / 1M tokens	$0.60 / 1M tokens
GPT-4 Turbo	$10.00 / 1M tokens	$30.00 / 1M tokens
o1	$15.00 / 1M tokens	$60.00 / 1M tokens

适用场景

✅ 通用对话和问答
✅ 代码生成和调试
✅ 图像理解和分析
✅ 复杂推理任务
✅ 企业级应用开发

Anthropic Claude 系列

模型家族

Claude 以其长上下文和安全对齐著称：

模型	上下文长度	特点
Claude 3.5 Sonnet	200K	平衡性能与速度
Claude 3.5 Haiku	200K	快速响应、高性价比
Claude 3 Opus	200K	最高性能
Claude 3 Sonnet	200K	平衡版本
Claude 3 Haiku	200K	轻量级

核心优势

1. 超长上下文处理

from anthropic import Anthropic

client = Anthropic()

# 处理长文档
with open("long_document.pdf", "rb") as f:
    # Claude 可以处理高达 200K tokens
    # 相当于约 500 页的文档
    
    message = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=4096,
        messages=[
            {
                "role": "user",
                "content": f.read().decode() + "\n\n请总结这份文档的核心内容"
            }
        ]
    )
from anthropic import Anthropic

client = Anthropic()

# 处理长文档
with open("long_document.pdf", "rb") as f:
    # Claude 可以处理高达 200K tokens
    # 相当于约 500 页的文档
    
    message = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=4096,
        messages=[
            {
                "role": "user",
                "content": f.read().decode() + "\n\n请总结这份文档的核心内容"
            }
        ]
    )

2. 系统提示词（System Prompt）

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system="""你是一位专业的技术文档写作专家。
    你的回答应该：
    1. 结构清晰，使用 Markdown 格式
    2. 包含代码示例
    3. 提供最佳实践建议""",
    messages=[
        {"role": "user", "content": "如何设计一个 RESTful API？"}
    ]
)
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system="""你是一位专业的技术文档写作专家。
    你的回答应该：
    1. 结构清晰，使用 Markdown 格式
    2. 包含代码示例
    3. 提供最佳实践建议""",
    messages=[
        {"role": "user", "content": "如何设计一个 RESTful API？"}
    ]
)

3. 工具使用（Tool Use）

# 定义工具
tools = [
    {
        "name": "calculate",
        "description": "执行数学计算",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "数学表达式"
                }
            },
            "required": ["expression"]
        }
    }
]

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "计算 123 * 456"}
    ]
)
# 定义工具
tools = [
    {
        "name": "calculate",
        "description": "执行数学计算",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "数学表达式"
                }
            },
            "required": ["expression"]
        }
    }
]

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "计算 123 * 456"}
    ]
)

API 定价（2024年参考）

模型	输入价格	输出价格
Claude 3.5 Sonnet	$3.00 / 1M tokens	$15.00 / 1M tokens
Claude 3.5 Haiku	$0.80 / 1M tokens	$4.00 / 1M tokens
Claude 3 Opus	$15.00 / 1M tokens	$75.00 / 1M tokens
Claude 3 Haiku	$0.25 / 1M tokens	$1.25 / 1M tokens

适用场景

✅ 长文档分析和总结
✅ 学术研究和写作
✅ 代码审查和优化
✅ 安全敏感应用
✅ 需要详细推理的任务

Meta LLaMA 系列

模型家族

LLaMA 是最重要的开源大模型系列：

模型	参数规模	上下文长度	许可证
LLaMA 3.1 8B	80亿	128K	开源商用
LLaMA 3.1 70B	700亿	128K	开源商用
LLaMA 3.1 405B	4050亿	128K	开源商用
LLaMA 3.2 1B	10亿	128K	开源商用
LLaMA 3.2 3B	30亿	128K	开源商用
LLaMA 3.2 11B Vision	110亿	128K	开源商用
LLaMA 3.2 90B Vision	900亿	128K	开源商用

核心优势

1. 完全开源可商用

# 从 Hugging Face 下载模型
pip install transformers

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "meta-llama/Llama-3.2-3B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)
# 从 Hugging Face 下载模型
pip install transformers

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "meta-llama/Llama-3.2-3B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

2. 本地部署

# 使用 Ollama 本地运行
# 安装：curl -fsSL https://ollama.com/install.sh | sh

# 终端命令
# ollama run llama3.2

# Python 调用
import requests

response = requests.post(
    "http://localhost:11434/api/generate",
    json={
        "model": "llama3.2",
        "prompt": "解释什么是机器学习"
    }
)
# 使用 Ollama 本地运行
# 安装：curl -fsSL https://ollama.com/install.sh | sh

# 终端命令
# ollama run llama3.2

# Python 调用
import requests

response = requests.post(
    "http://localhost:11434/api/generate",
    json={
        "model": "llama3.2",
        "prompt": "解释什么是机器学习"
    }
)

3. 微调能力

# 使用 LoRA 进行高效微调
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none"
)

model = get_peft_model(model, lora_config)
# 使用 LoRA 进行高效微调
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none"
)

model = get_peft_model(model, lora_config)

适用场景

✅ 本地化部署需求
✅ 数据隐私敏感场景
✅ 模型微调和定制
✅ 成本敏感的应用
✅ 学习和研究用途

国内主流模型

通义千问（Qwen）

阿里巴巴开发的大语言模型：

模型	参数规模	特点
Qwen-Max	未公开	最强性能
Qwen-Plus	未公开	平衡版本
Qwen-Turbo	未公开	快速响应
Qwen2.5-72B	720亿	开源版本

from dashscope import Generation

# 阿里云 API 调用
response = Generation.call(
    model='qwen-max',
    prompt='请介绍一下人工智能的发展历程'
)
from dashscope import Generation

# 阿里云 API 调用
response = Generation.call(
    model='qwen-max',
    prompt='请介绍一下人工智能的发展历程'
)

文心一言（ERNIE）

百度开发的大语言模型：

模型	特点
ERNIE 4.0	最强版本
ERNIE 3.5	平衡版本
ERNIE Speed	快速版本

import wenxinai

# 百度 API 调用
response = wenxinai.ChatCompletion.create(
    model="ernie-4.0",
    messages=[
        {"role": "user", "content": "你好"}
    ]
)
import wenxinai

# 百度 API 调用
response = wenxinai.ChatCompletion.create(
    model="ernie-4.0",
    messages=[
        {"role": "user", "content": "你好"}
    ]
)

智谱 GLM

智谱 AI 开发的大语言模型：

模型	参数规模	特点
GLM-4	未公开	最新版本
GLM-4-Plus	未公开	增强版本
GLM-4-Air	未公开	轻量版本

from zhipuai import ZhipuAI

client = ZhipuAI(api_key="your_api_key")

response = client.chat.completions.create(
    model="glm-4",
    messages=[
        {"role": "user", "content": "你好"}
    ]
)
from zhipuai import ZhipuAI

client = ZhipuAI(api_key="your_api_key")

response = client.chat.completions.create(
    model="glm-4",
    messages=[
        {"role": "user", "content": "你好"}
    ]
)

模型能力对比

综合能力评分

基于主流评测基准的对比（分数仅供参考）：

模型	MMLU	HumanEval	GSM8K	中文能力
GPT-4o	88.7	90.2	95.3	⭐⭐⭐⭐
Claude 3.5 Sonnet	88.7	92.0	96.4	⭐⭐⭐⭐
LLaMA 3.1 405B	88.6	89.0	96.8	⭐⭐⭐
Qwen-Max	85.5	85.0	92.0	⭐⭐⭐⭐⭐
GLM-4	85.0	84.0	90.0	⭐⭐⭐⭐⭐

特性对比表

特性	GPT-4	Claude 3	LLaMA 3	通义千问
多模态	✅	✅	✅ (部分)	✅
长上下文	128K	200K	128K	32K
Function Calling	✅	✅	❌	✅
本地部署	❌	❌	✅	✅ (开源版)
中文优化	⭐⭐⭐	⭐⭐⭐	⭐⭐	⭐⭐⭐⭐⭐
价格竞争力	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐

详细性能基准测试

生产级性能测试数据

以下数据基于实际生产环境的测试结果（2024-2025年）：

1. 延迟性能对比

┌─────────────────────────────────────────────────────────────────┐
│                    API 响应延迟对比                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  模型               P50(ms)    P95(ms)    P99(ms)    首 Token  │
│  ─────────────────────────────────────────────────────────────  │
│  GPT-4o             450        1200       2500       180       │
│  GPT-4o-mini        280        650        1200       95        │
│  Claude 3.5 Sonnet  520        1400       2800       210       │
│  Claude 3.5 Haiku   320        780        1400       110       │
│  o1                 2500       8000       15000      350       │
│  LLaMA 3.1 70B      800        2000       4000       250       │
│  Qwen-Max           600        1500       3000       200       │
│                                                                 │
│  测试条件：1000 token 输入，500 token 输出                      │
│  测试环境：AWS us-east-1，并发 10                               │
└─────────────────────────────────────────────────────────────────┘

生产级延迟测试代码：

import asyncio
import time
from dataclasses import dataclass
from typing import List
import statistics

@dataclass
class LatencyMetrics:
    p50: float
    p95: float
    p99: float
    mean: float
    time_to_first_token: float

class LatencyBenchmark:
    """生产级延迟基准测试"""
    
    def __init__(self, client, model: str):
        self.client = client
        self.model = model
    
    async def measure_single_request(self, prompt: str) -> dict:
        """测量单次请求延迟"""
        start_time = time.time()
        first_token_time = None
        
        stream = await self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            stream=True
        )
        
        chunk_count = 0
        async for chunk in stream:
            if chunk_count == 0:
                first_token_time = time.time()
            chunk_count += 1
        
        end_time = time.time()
        
        return {
            "total_latency": (end_time - start_time) * 1000,
            "time_to_first_token": (first_token_time - start_time) * 1000 if first_token_time else 0
        }
    
    async def run_benchmark(
        self, 
        prompt: str, 
        num_requests: int = 100,
        concurrency: int = 10
    ) -> LatencyMetrics:
        """运行完整基准测试"""
        
        results = []
        semaphore = asyncio.Semaphore(concurrency)
        
        async def bounded_request():
            async with semaphore:
                return await self.measure_single_request(prompt)
        
        tasks = [bounded_request() for _ in range(num_requests)]
        results = await asyncio.gather(*tasks)
        
        latencies = [r["total_latency"] for r in results]
        ttft = [r["time_to_first_token"] for r in results]
        
        return LatencyMetrics(
            p50=statistics.quantiles(latencies, n=100)[49],
            p95=statistics.quantiles(latencies, n=100)[94],
            p99=statistics.quantiles(latencies, n=100)[98],
            mean=statistics.mean(latencies),
            time_to_first_token=statistics.mean(ttft)
        )
import asyncio
import time
from dataclasses import dataclass
from typing import List
import statistics

@dataclass
class LatencyMetrics:
    p50: float
    p95: float
    p99: float
    mean: float
    time_to_first_token: float

class LatencyBenchmark:
    """生产级延迟基准测试"""
    
    def __init__(self, client, model: str):
        self.client = client
        self.model = model
    
    async def measure_single_request(self, prompt: str) -> dict:
        """测量单次请求延迟"""
        start_time = time.time()
        first_token_time = None
        
        stream = await self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            stream=True
        )
        
        chunk_count = 0
        async for chunk in stream:
            if chunk_count == 0:
                first_token_time = time.time()
            chunk_count += 1
        
        end_time = time.time()
        
        return {
            "total_latency": (end_time - start_time) * 1000,
            "time_to_first_token": (first_token_time - start_time) * 1000 if first_token_time else 0
        }
    
    async def run_benchmark(
        self, 
        prompt: str, 
        num_requests: int = 100,
        concurrency: int = 10
    ) -> LatencyMetrics:
        """运行完整基准测试"""
        
        results = []
        semaphore = asyncio.Semaphore(concurrency)
        
        async def bounded_request():
            async with semaphore:
                return await self.measure_single_request(prompt)
        
        tasks = [bounded_request() for _ in range(num_requests)]
        results = await asyncio.gather(*tasks)
        
        latencies = [r["total_latency"] for r in results]
        ttft = [r["time_to_first_token"] for r in results]
        
        return LatencyMetrics(
            p50=statistics.quantiles(latencies, n=100)[49],
            p95=statistics.quantiles(latencies, n=100)[94],
            p99=statistics.quantiles(latencies, n=100)[98],
            mean=statistics.mean(latencies),
            time_to_first_token=statistics.mean(ttft)
        )

2. 质量基准测试

┌─────────────────────────────────────────────────────────────────┐
│                    质量基准测试结果                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  任务类型          GPT-4o  Claude3.5  LLaMA3.1  Qwen-Max       │
│  ─────────────────────────────────────────────────────────────  │
│  代码生成          92.3%   93.1%      88.5%     86.2%          │
│  数学推理          95.1%   96.4%      96.8%     92.0%          │
│  文本摘要          91.8%   92.5%      89.2%     90.5%          │
│  翻译质量          89.5%   90.2%      85.3%     93.8%          │
│  问答准确率        94.2%   95.1%      91.6%     92.4%          │
│  指令遵循          96.8%   97.2%      93.5%     94.1%          │
│  幻觉率            2.1%    1.8%       4.5%      3.2%           │
│                                                                 │
│  测试集：各自标准测试集 + 内部生产数据集                        │
└─────────────────────────────────────────────────────────────────┘

3. 成本效益分析

class CostBenefitAnalyzer:
    """成本效益分析器"""
    
    # 真实生产环境定价（2025年1月）
    PRICING = {
        "gpt-4o": {"input": 2.50, "output": 10.00},
        "gpt-4o-mini": {"input": 0.15, "output": 0.60},
        "claude-3-5-sonnet": {"input": 3.00, "output": 15.00},
        "claude-3-5-haiku": {"input": 0.80, "output": 4.00},
        "o1": {"input": 15.00, "output": 60.00},
    }
    
    def analyze_monthly_cost(
        self,
        model: str,
        daily_requests: int,
        avg_input_tokens: int,
        avg_output_tokens: int
    ) -> dict:
        """分析月度成本"""
        
        pricing = self.PRICING[model]
        
        daily_input_cost = (daily_requests * avg_input_tokens / 1_000_000) * pricing["input"]
        daily_output_cost = (daily_requests * avg_output_tokens / 1_000_000) * pricing["output"]
        daily_total = daily_input_cost + daily_output_cost
        
        return {
            "model": model,
            "daily_cost": daily_total,
            "monthly_cost": daily_total * 30,
            "yearly_cost": daily_total * 365,
            "cost_per_request": daily_total / daily_requests,
            "breakdown": {
                "input_cost_pct": daily_input_cost / daily_total * 100,
                "output_cost_pct": daily_output_cost / daily_total * 100
            }
        }

# 示例：客服场景成本对比
analyzer = CostBenefitAnalyzer()

# 假设：日均 10000 请求，平均 500 输入 + 200 输出
for model in ["gpt-4o", "gpt-4o-mini", "claude-3-5-sonnet", "claude-3-5-haiku"]:
    result = analyzer.analyze_monthly_cost(
        model=model,
        daily_requests=10000,
        avg_input_tokens=500,
        avg_output_tokens=200
    )
    print(f"{model}: ${result['monthly_cost']:.2f}/月")
class CostBenefitAnalyzer:
    """成本效益分析器"""
    
    # 真实生产环境定价（2025年1月）
    PRICING = {
        "gpt-4o": {"input": 2.50, "output": 10.00},
        "gpt-4o-mini": {"input": 0.15, "output": 0.60},
        "claude-3-5-sonnet": {"input": 3.00, "output": 15.00},
        "claude-3-5-haiku": {"input": 0.80, "output": 4.00},
        "o1": {"input": 15.00, "output": 60.00},
    }
    
    def analyze_monthly_cost(
        self,
        model: str,
        daily_requests: int,
        avg_input_tokens: int,
        avg_output_tokens: int
    ) -> dict:
        """分析月度成本"""
        
        pricing = self.PRICING[model]
        
        daily_input_cost = (daily_requests * avg_input_tokens / 1_000_000) * pricing["input"]
        daily_output_cost = (daily_requests * avg_output_tokens / 1_000_000) * pricing["output"]
        daily_total = daily_input_cost + daily_output_cost
        
        return {
            "model": model,
            "daily_cost": daily_total,
            "monthly_cost": daily_total * 30,
            "yearly_cost": daily_total * 365,
            "cost_per_request": daily_total / daily_requests,
            "breakdown": {
                "input_cost_pct": daily_input_cost / daily_total * 100,
                "output_cost_pct": daily_output_cost / daily_total * 100
            }
        }

# 示例：客服场景成本对比
analyzer = CostBenefitAnalyzer()

# 假设：日均 10000 请求，平均 500 输入 + 200 输出
for model in ["gpt-4o", "gpt-4o-mini", "claude-3-5-sonnet", "claude-3-5-haiku"]:
    result = analyzer.analyze_monthly_cost(
        model=model,
        daily_requests=10000,
        avg_input_tokens=500,
        avg_output_tokens=200
    )
    print(f"{model}: ${result['monthly_cost']:.2f}/月")

企业级选型决策框架

评分卡系统

from dataclasses import dataclass
from typing import Dict, List
from enum import Enum

class Criterion(Enum):
    PERFORMANCE = "performance"
    COST = "cost"
    LATENCY = "latency"
    CHINESE_SUPPORT = "chinese_support"
    COMPLIANCE = "compliance"
    ECOSYSTEM = "ecosystem"

@dataclass
class SelectionCriteria:
    """选型标准"""
    criteria: Dict[Criterion, float]  # 权重 (0-1)
    requirements: Dict[Criterion, str]  # 最低要求

class ModelSelector:
    """企业级模型选择器"""
    
    def __init__(self):
        self.models = self._load_model_data()
    
    def score_model(
        self, 
        model: str, 
        criteria: SelectionCriteria
    ) -> dict:
        """对模型进行评分"""
        
        scores = {}
        model_data = self.models[model]
        
        for criterion, weight in criteria.criteria.items():
            raw_score = self._get_criterion_score(model, criterion)
            weighted_score = raw_score * weight
            scores[criterion.value] = {
                "raw_score": raw_score,
                "weight": weight,
                "weighted_score": weighted_score
            }
        
        total_score = sum(s["weighted_score"] for s in scores.values())
        
        return {
            "model": model,
            "total_score": total_score,
            "breakdown": scores,
            "meets_requirements": self._check_requirements(model, criteria.requirements)
        }
    
    def recommend(
        self, 
        criteria: SelectionCriteria,
        top_k: int = 3
    ) -> List[dict]:
        """推荐最适合的模型"""
        
        results = []
        for model in self.models:
            score_result = self.score_model(model, criteria)
            if score_result["meets_requirements"]:
                results.append(score_result)
        
        results.sort(key=lambda x: x["total_score"], reverse=True)
        return results[:top_k]


# 企业选型示例
criteria = SelectionCriteria(
    criteria={
        Criterion.PERFORMANCE: 0.30,
        Criterion.COST: 0.25,
        Criterion.LATENCY: 0.20,
        Criterion.CHINESE_SUPPORT: 0.15,
        Criterion.COMPLIANCE: 0.10,
    },
    requirements={
        Criterion.LATENCY: "P95 < 2000ms",
        Criterion.COMPLIANCE: "数据不出境",
        Criterion.CHINESE_SUPPORT: "优秀"
    }
)

selector = ModelSelector()
recommendations = selector.recommend(criteria)
from dataclasses import dataclass
from typing import Dict, List
from enum import Enum

class Criterion(Enum):
    PERFORMANCE = "performance"
    COST = "cost"
    LATENCY = "latency"
    CHINESE_SUPPORT = "chinese_support"
    COMPLIANCE = "compliance"
    ECOSYSTEM = "ecosystem"

@dataclass
class SelectionCriteria:
    """选型标准"""
    criteria: Dict[Criterion, float]  # 权重 (0-1)
    requirements: Dict[Criterion, str]  # 最低要求

class ModelSelector:
    """企业级模型选择器"""
    
    def __init__(self):
        self.models = self._load_model_data()
    
    def score_model(
        self, 
        model: str, 
        criteria: SelectionCriteria
    ) -> dict:
        """对模型进行评分"""
        
        scores = {}
        model_data = self.models[model]
        
        for criterion, weight in criteria.criteria.items():
            raw_score = self._get_criterion_score(model, criterion)
            weighted_score = raw_score * weight
            scores[criterion.value] = {
                "raw_score": raw_score,
                "weight": weight,
                "weighted_score": weighted_score
            }
        
        total_score = sum(s["weighted_score"] for s in scores.values())
        
        return {
            "model": model,
            "total_score": total_score,
            "breakdown": scores,
            "meets_requirements": self._check_requirements(model, criteria.requirements)
        }
    
    def recommend(
        self, 
        criteria: SelectionCriteria,
        top_k: int = 3
    ) -> List[dict]:
        """推荐最适合的模型"""
        
        results = []
        for model in self.models:
            score_result = self.score_model(model, criteria)
            if score_result["meets_requirements"]:
                results.append(score_result)
        
        results.sort(key=lambda x: x["total_score"], reverse=True)
        return results[:top_k]


# 企业选型示例
criteria = SelectionCriteria(
    criteria={
        Criterion.PERFORMANCE: 0.30,
        Criterion.COST: 0.25,
        Criterion.LATENCY: 0.20,
        Criterion.CHINESE_SUPPORT: 0.15,
        Criterion.COMPLIANCE: 0.10,
    },
    requirements={
        Criterion.LATENCY: "P95 < 2000ms",
        Criterion.COMPLIANCE: "数据不出境",
        Criterion.CHINESE_SUPPORT: "优秀"
    }
)

selector = ModelSelector()
recommendations = selector.recommend(criteria)

多模型组合策略

class MultiModelStrategy:
    """多模型组合策略"""
    
    def __init__(self):
        self.models = {
            "primary": "gpt-4o-mini",      # 默认主模型
            "fallback": "claude-3-5-haiku", # 降级模型
            "complex": "gpt-4o",           # 复杂任务模型
            "reasoning": "o1",             # 推理任务模型
            "chinese": "qwen-max",         # 中文优化模型
        }
    
    def select_model(self, request: dict) -> str:
        """智能选择模型"""
        
        # 1. 分析请求特征
        complexity = self._analyze_complexity(request)
        language = self._detect_language(request["prompt"])
        requires_reasoning = self._check_reasoning(request)
        
        # 2. 选择最优模型
        if requires_reasoning:
            return self.models["reasoning"]
        elif complexity > 0.7:
            return self.models["complex"]
        elif language == "zh" and len(request["prompt"]) > 500:
            return self.models["chinese"]
        else:
            return self.models["primary"]
    
    async def execute_with_fallback(
        self, 
        request: dict,
        max_retries: int = 2
    ) -> dict:
        """带降级的执行"""
        
        primary_model = self.select_model(request)
        models_to_try = [primary_model, self.models["fallback"]]
        
        for attempt, model in enumerate(models_to_try[:max_retries]):
            try:
                response = await self._call_model(model, request)
                response["model_used"] = model
                return response
            except Exception as e:
                if attempt == max_retries - 1:
                    raise
                continue
class MultiModelStrategy:
    """多模型组合策略"""
    
    def __init__(self):
        self.models = {
            "primary": "gpt-4o-mini",      # 默认主模型
            "fallback": "claude-3-5-haiku", # 降级模型
            "complex": "gpt-4o",           # 复杂任务模型
            "reasoning": "o1",             # 推理任务模型
            "chinese": "qwen-max",         # 中文优化模型
        }
    
    def select_model(self, request: dict) -> str:
        """智能选择模型"""
        
        # 1. 分析请求特征
        complexity = self._analyze_complexity(request)
        language = self._detect_language(request["prompt"])
        requires_reasoning = self._check_reasoning(request)
        
        # 2. 选择最优模型
        if requires_reasoning:
            return self.models["reasoning"]
        elif complexity > 0.7:
            return self.models["complex"]
        elif language == "zh" and len(request["prompt"]) > 500:
            return self.models["chinese"]
        else:
            return self.models["primary"]
    
    async def execute_with_fallback(
        self, 
        request: dict,
        max_retries: int = 2
    ) -> dict:
        """带降级的执行"""
        
        primary_model = self.select_model(request)
        models_to_try = [primary_model, self.models["fallback"]]
        
        for attempt, model in enumerate(models_to_try[:max_retries]):
            try:
                response = await self._call_model(model, request)
                response["model_used"] = model
                return response
            except Exception as e:
                if attempt == max_retries - 1:
                    raise
                continue

如何选择合适的模型

决策流程图

开始选型
    │
    ├── 是否需要本地部署？
    │   ├── 是 → LLaMA / Qwen 开源版
    │   └── 否 ↓
    │
    ├── 是否需要处理长文档？
    │   ├── 是（>100K tokens）→ Claude 3
    │   └── 否 ↓
    │
    ├── 是否需要多模态能力？
    │   ├── 是 → GPT-4o / Claude 3
    │   └── 否 ↓
    │
    ├── 是否主要面向中文用户？
    │   ├── 是 → 通义千问 / GLM-4 / Claude 3.5
    │   └── 否 ↓
    │
    └── 预算和性能平衡 → GPT-4o mini / Claude 3.5 Haiku

场景化推荐

场景一：企业客服聊天机器人

推荐：GPT-4o mini 或 Claude 3.5 Haiku
理由：
- 响应速度快
- 成本可控
- 支持中文
- 可集成知识库

场景二：文档分析系统

推荐：Claude 3.5 Sonnet
理由：
- 超长上下文支持
- 分析能力强
- 输出质量高

场景三：私有化部署应用

推荐：LLaMA 3.1 70B 或 Qwen2.5-72B
理由：
- 完全开源可商用
- 可本地部署
- 支持微调
- 数据安全可控

场景四：代码助手

推荐：Claude 3.5 Sonnet 或 GPT-4o
理由：
- 代码理解能力强
- 支持多种语言
- 可解释性好

成本优化策略

1. 模型分层使用

def get_smart_response(query, complexity_threshold=0.5):
    """根据问题复杂度选择模型"""
    complexity = analyze_complexity(query)
    
    if complexity > complexity_threshold:
        # 复杂问题用大模型
        return call_gpt4(query)
    else:
        # 简单问题用小模型
        return call_gpt4_mini(query)
def get_smart_response(query, complexity_threshold=0.5):
    """根据问题复杂度选择模型"""
    complexity = analyze_complexity(query)
    
    if complexity > complexity_threshold:
        # 复杂问题用大模型
        return call_gpt4(query)
    else:
        # 简单问题用小模型
        return call_gpt4_mini(query)

2. 缓存策略

import hashlib

def get_cached_or_call(query):
    """使用缓存减少 API 调用"""
    cache_key = hashlib.md5(query.encode()).hexdigest()
    
    if cache_key in cache:
        return cache[cache_key]
    
    response = call_llm(query)
    cache[cache_key] = response
    return response
import hashlib

def get_cached_or_call(query):
    """使用缓存减少 API 调用"""
    cache_key = hashlib.md5(query.encode()).hexdigest()
    
    if cache_key in cache:
        return cache[cache_key]
    
    response = call_llm(query)
    cache[cache_key] = response
    return response

3. 批量处理

# 批量处理多个请求
def batch_process(queries):
    """将多个小请求合并为一个大请求"""
    combined_prompt = "\n---\n".join(queries)
    combined_prompt += "\n\n请逐条回答以上问题，用 --- 分隔答案"
    
    response = call_llm(combined_prompt)
    return parse_batch_response(response)
# 批量处理多个请求
def batch_process(queries):
    """将多个小请求合并为一个大请求"""
    combined_prompt = "\n---\n".join(queries)
    combined_prompt += "\n\n请逐条回答以上问题，用 --- 分隔答案"
    
    response = call_llm(combined_prompt)
    return parse_batch_response(response)

小结

本章我们学习了：

主流模型分类：商业闭源、开源模型、国内模型三大类别
GPT 系列：多模态、Function Calling、JSON 模式等核心能力
Claude 系列：长上下文、系统提示词、工具使用等特色
LLaMA 系列：开源可商用、本地部署、微调能力
国内模型：通义千问、文心一言、智谱 GLM 的特点
选型策略：根据场景、预算、技术需求综合决策

实践练习

对比测试：用同一个问题测试 GPT-4o 和 Claude 3.5 Sonnet，对比输出质量
成本计算：计算处理 100 万个用户查询的 API 成本
本地部署：使用 Ollama 部署 LLaMA 3.2 模型

参考资料

下一章预告

在下一章《API 调用基础》中，我们将深入学习：

API 认证与安全最佳实践
请求格式与参数详解
响应处理与错误处理
流式输出实现
SDK 使用技巧

教程系列持续更新中，欢迎关注！

大模型应用开发教程（二）：主流大模型介绍与选择

前言

主流大模型概览

OpenAI GPT 系列

模型家族

核心优势

API 定价（2024年参考）

适用场景

Anthropic Claude 系列

模型家族

核心优势

API 定价（2024年参考）

适用场景

Meta LLaMA 系列

模型家族

核心优势

适用场景

国内主流模型

通义千问（Qwen）

文心一言（ERNIE）

智谱 GLM

模型能力对比

综合能力评分

特性对比表

详细性能基准测试

生产级性能测试数据

1. 延迟性能对比

2. 质量基准测试

3. 成本效益分析

企业级选型决策框架

评分卡系统

多模型组合策略

如何选择合适的模型

决策流程图

场景化推荐

成本优化策略

小结

实践练习

参考资料

下一章预告

相关文章

大模型应用开发教程（三）：API 调用基础

提示词工程实战指南：从基础到进阶

从零到一实现 nano-agent（三）：Provider 抽象与多模型支持