fix: type error

fix: size
feature: support glm Cogview
2025-08-31 03:09:04 +08:00 · 2024-12-27 22:35:40 +08:00 · 2024-12-27 21:57:23 +08:00 · 2024-12-27 21:52:22 +08:00 · 2024-12-27 20:39:08 +08:00 · 2024-12-27 20:30:44 +08:00
21 changed files with 369 additions and 81 deletions
--- a/README.md
+++ b/README.md
@@ -1,16 +1,17 @@
 <div align="center">

-<a href='#企业版'>
-  <img src="./docs/images/ent.svg" alt="icon"/>
+<a href='https://nextchat.dev/chat'>
+  <img src="https://github.com/user-attachments/assets/287c510f-f508-478e-ade3-54d30453dc18" width="1000" alt="icon"/>
 </a>

+
 <h1 align="center">NextChat (ChatGPT Next Web)</h1>

 English / [简体中文](./README_CN.md)

-One-Click to get a well-designed cross-platform ChatGPT web UI, with GPT3, GPT4 & Gemini Pro support.
+One-Click to get a well-designed cross-platform ChatGPT web UI, with Claude, GPT4 & Gemini Pro support.

-一键免费部署你的跨平台私人 ChatGPT 应用, 支持 GPT3, GPT4 & Gemini Pro 模型。
+一键免费部署你的跨平台私人 ChatGPT 应用, 支持 Claude, GPT4 & Gemini Pro 模型。

 [![Saas][Saas-image]][saas-url]
 [![Web][Web-image]][web-url]
@@ -31,7 +32,7 @@ One-Click to get a well-designed cross-platform ChatGPT web UI, with GPT3, GPT4
 [MacOS-image]: https://img.shields.io/badge/-MacOS-black?logo=apple
 [Linux-image]: https://img.shields.io/badge/-Linux-333?logo=ubuntu

-[<img src="https://vercel.com/button" alt="Deploy on Vercel" height="30">](https://vercel.com/new/clone?repository-url=https%3A%2F%2Fgithub.com%2FChatGPTNextWeb%2FChatGPT-Next-Web&env=OPENAI_API_KEY&env=CODE&project-name=nextchat&repository-name=NextChat) [<img src="https://zeabur.com/button.svg" alt="Deploy on Zeabur" height="30">](https://zeabur.com/templates/ZBUEFA)  [<img src="https://gitpod.io/button/open-in-gitpod.svg" alt="Open in Gitpod" height="30">](https://gitpod.io/#https://github.com/Yidadaa/ChatGPT-Next-Web) [<img src="https://img.shields.io/badge/BT_Deploy-Install-20a53a" alt="BT Deply Install" height="30">](https://www.bt.cn/new/download.html) [<img src="https://svgshare.com/i/1AVg.svg" alt="Deploy to Alibaba Cloud" height="30">](https://computenest.aliyun.com/market/service-f1c9b75e59814dc49d52)
+[<img src="https://vercel.com/button" alt="Deploy on Vercel" height="30">](https://vercel.com/new/clone?repository-url=https://github.com/Dogtiti/ChatGPT-Next-Web-EarlyBird&env=OPENAI_API_KEY&env=CODE&project-name=nextchat-earlyBird&repository-name=NextChat-EarlyBird) [<img src="https://zeabur.com/button.svg" alt="Deploy on Zeabur" height="30">](https://zeabur.com/templates/ZBUEFA)  [<img src="https://gitpod.io/button/open-in-gitpod.svg" alt="Open in Gitpod" height="30">](https://gitpod.io/#https://github.com/Yidadaa/ChatGPT-Next-Web) [<img src="https://img.shields.io/badge/BT_Deploy-Install-20a53a" alt="BT Deply Install" height="30">](https://www.bt.cn/new/download.html)

 [<img src="https://github.com/user-attachments/assets/903482d4-3e87-4134-9af1-f2588fa90659" height="60" width="288" >](https://monica.im/?utm=nxcrp)

@@ -355,6 +356,13 @@ For ByteDance: use `modelName@bytedance=deploymentName` to customize model name

 Change default model

+### `VISION_MODELS` (optional)
+
+> Default: Empty
+> Example: `gpt-4-vision,claude-3-opus,my-custom-model` means add vision capabilities to these models in addition to the default pattern matches (which detect models containing keywords like "vision", "claude-3", "gemini-1.5", etc).
+
+Add additional models to have vision capabilities, beyond the default pattern matching. Multiple models should be separated by commas.
+
 ### `WHITE_WEBDAV_ENDPOINTS` (optional)

 You can use this option if you want to increase the number of webdav service addresses you are allowed to access, as required by the format：
@@ -469,7 +477,7 @@ If you want to add a new translation, read this [document](./docs/translation.md

 ## Donation

-[Buy Me a Coffee](https://www.buymeacoffee.com/yidadaa)
+[Buy Me a Coffee](https://1kafei.com/dogtiti)

 ## Special Thanks

--- a/README_CN.md
+++ b/README_CN.md
@@ -33,7 +33,7 @@

 1. 准备好你的 [OpenAI API Key](https://platform.openai.com/account/api-keys);
 2. 点击右侧按钮开始部署：
-   [![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https%3A%2F%2Fgithub.com%2FYidadaa%2FChatGPT-Next-Web&env=OPENAI_API_KEY&env=CODE&env=GOOGLE_API_KEY&project-name=chatgpt-next-web&repository-name=ChatGPT-Next-Web)，直接使用 Github 账号登录即可，记得在环境变量页填入 API Key 和[页面访问密码](#配置页面访问密码) CODE；
+   [![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https://github.com/Dogtiti/ChatGPT-Next-Web-EarlyBird&env=OPENAI_API_KEY&env=CODE&project-name=nextchat-earlyBird&repository-name=NextChat-EarlyBird)，直接使用 Github 账号登录即可，记得在环境变量页填入 API Key 和[页面访问密码](#配置页面访问密码) CODE；
 3. 部署完毕后，即可开始使用；
 4. （可选）[绑定自定义域名](https://vercel.com/docs/concepts/projects/domains/add-a-domain)：Vercel 分配的域名 DNS 在某些区域被污染了，绑定自定义域名即可直连。

@@ -235,6 +235,13 @@ ChatGLM Api Url.

 更改默认模型

+### `VISION_MODELS` (可选)
+
+> 默认值：空
+> 示例：`gpt-4-vision,claude-3-opus,my-custom-model` 表示为这些模型添加视觉能力，作为对默认模式匹配的补充（默认会检测包含"vision"、"claude-3"、"gemini-1.5"等关键词的模型）。
+
+在默认模式匹配之外，添加更多具有视觉能力的模型。多个模型用逗号分隔。
+
 ### `DEFAULT_INPUT_TEMPLATE` （可选）

 自定义默认的 template，用于初始化『设置』中的『用户输入预处理』配置项
--- a/README_JA.md
+++ b/README_JA.md
@@ -30,7 +30,7 @@

 1. [OpenAI API Key](https://platform.openai.com/account/api-keys)を準備する;
 2. 右側のボタンをクリックしてデプロイを開始：
-   [![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https%3A%2F%2Fgithub.com%2FYidadaa%2FChatGPT-Next-Web&env=OPENAI_API_KEY&env=CODE&env=GOOGLE_API_KEY&project-name=chatgpt-next-web&repository-name=ChatGPT-Next-Web) 、GitHubアカウントで直接ログインし、環境変数ページにAPI Keyと[ページアクセスパスワード](#設定ページアクセスパスワード) CODEを入力してください;
+   [![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https://github.com/Dogtiti/ChatGPT-Next-Web-EarlyBird&env=OPENAI_API_KEY&env=CODE&project-name=nextchat-earlyBird&repository-name=NextChat-EarlyBird) 、GitHubアカウントで直接ログインし、環境変数ページにAPI Keyと[ページアクセスパスワード](#設定ページアクセスパスワード) CODEを入力してください;
 3. デプロイが完了したら、すぐに使用を開始できます;
 4. （オプション）[カスタムドメインをバインド](https://vercel.com/docs/concepts/projects/domains/add-a-domain)：Vercelが割り当てたドメインDNSは一部の地域で汚染されているため、カスタムドメインをバインドすると直接接続できます。

@@ -217,6 +217,13 @@ ByteDance モードでは、`modelName@bytedance=deploymentName` 形式でモデ

 デフォルトのモデルを変更します。

+### `VISION_MODELS` (オプション)
+
+> デフォルト：空
+> 例：`gpt-4-vision,claude-3-opus,my-custom-model` は、これらのモデルにビジョン機能を追加します。これはデフォルトのパターンマッチング（"vision"、"claude-3"、"gemini-1.5"などのキーワードを含むモデルを検出）に加えて適用されます。
+
+デフォルトのパターンマッチングに加えて、追加のモデルにビジョン機能を付与します。複数のモデルはカンマで区切ります。
+
 ### `DEFAULT_INPUT_TEMPLATE` （オプション）

 『設定』の『ユーザー入力前処理』の初期設定に使用するテンプレートをカスタマイズします。
--- a/app/api/openai.ts
+++ b/app/api/openai.ts
@@ -14,7 +14,7 @@ function getModels(remoteModelRes: OpenAIListModelResponse) {
  if (config.disableGPT4) {
    remoteModelRes.data = remoteModelRes.data.filter(
      (m) =>
-        !(m.id.startsWith("gpt-4") || m.id.startsWith("chatgpt-4o")) ||
+        !(m.id.startsWith("gpt-4") || m.id.startsWith("chatgpt-4o") || m.id.startsWith("o1")) ||
        m.id.startsWith("gpt-4o-mini"),
    );
  }
--- a/app/client/platforms/glm.ts
+++ b/app/client/platforms/glm.ts
@@ -25,12 +25,103 @@ import { getMessageTextContent } from "@/app/utils";
 import { RequestPayload } from "./openai";
 import { fetch } from "@/app/utils/stream";

+interface BasePayload {
+  model: string;
+}
+
+interface ChatPayload extends BasePayload {
+  messages: ChatOptions["messages"];
+  stream?: boolean;
+  temperature?: number;
+  presence_penalty?: number;
+  frequency_penalty?: number;
+  top_p?: number;
+}
+
+interface ImageGenerationPayload extends BasePayload {
+  prompt: string;
+  size?: string;
+  user_id?: string;
+}
+
+interface VideoGenerationPayload extends BasePayload {
+  prompt: string;
+  duration?: number;
+  resolution?: string;
+  user_id?: string;
+}
+
+type ModelType = "chat" | "image" | "video";
+
 export class ChatGLMApi implements LLMApi {
  private disableListModels = true;

+  private getModelType(model: string): ModelType {
+    if (model.startsWith("cogview-")) return "image";
+    if (model.startsWith("cogvideo-")) return "video";
+    return "chat";
+  }
+
+  private getModelPath(type: ModelType): string {
+    switch (type) {
+      case "image":
+        return ChatGLM.ImagePath;
+      case "video":
+        return ChatGLM.VideoPath;
+      default:
+        return ChatGLM.ChatPath;
+    }
+  }
+
+  private createPayload(
+    messages: ChatOptions["messages"],
+    modelConfig: any,
+    options: ChatOptions,
+  ): BasePayload {
+    const modelType = this.getModelType(modelConfig.model);
+    const lastMessage = messages[messages.length - 1];
+    const prompt =
+      typeof lastMessage.content === "string"
+        ? lastMessage.content
+        : lastMessage.content.map((c) => c.text).join("\n");
+
+    switch (modelType) {
+      case "image":
+        return {
+          model: modelConfig.model,
+          prompt,
+          size: options.config.size,
+        } as ImageGenerationPayload;
+      default:
+        return {
+          messages,
+          stream: options.config.stream,
+          model: modelConfig.model,
+          temperature: modelConfig.temperature,
+          presence_penalty: modelConfig.presence_penalty,
+          frequency_penalty: modelConfig.frequency_penalty,
+          top_p: modelConfig.top_p,
+        } as ChatPayload;
+    }
+  }
+
+  private parseResponse(modelType: ModelType, json: any): string {
+    switch (modelType) {
+      case "image": {
+        const imageUrl = json.data?.[0]?.url;
+        return imageUrl ? `![Generated Image](${imageUrl})` : "";
+      }
+      case "video": {
+        const videoUrl = json.data?.[0]?.url;
+        return videoUrl ? `<video controls src="${videoUrl}"></video>` : "";
+      }
+      default:
+        return this.extractMessage(json);
+    }
+  }
+
  path(path: string): string {
    const accessStore = useAccessStore.getState();
-
    let baseUrl = "";

    if (accessStore.useCustomConfig) {
@@ -51,7 +142,6 @@ export class ChatGLMApi implements LLMApi {
    }

    console.log("[Proxy Endpoint] ", baseUrl, path);
-
    return [baseUrl, path].join("/");
  }

@@ -79,24 +169,16 @@ export class ChatGLMApi implements LLMApi {
      },
    };

-    const requestPayload: RequestPayload = {
-      messages,
-      stream: options.config.stream,
-      model: modelConfig.model,
-      temperature: modelConfig.temperature,
-      presence_penalty: modelConfig.presence_penalty,
-      frequency_penalty: modelConfig.frequency_penalty,
-      top_p: modelConfig.top_p,
-    };
+    const modelType = this.getModelType(modelConfig.model);
+    const requestPayload = this.createPayload(messages, modelConfig, options);
+    const path = this.path(this.getModelPath(modelType));

-    console.log("[Request] glm payload: ", requestPayload);
+    console.log(`[Request] glm ${modelType} payload: `, requestPayload);

-    const shouldStream = !!options.config.stream;
    const controller = new AbortController();
    options.onController?.(controller);

    try {
-      const chatPath = this.path(ChatGLM.ChatPath);
      const chatPayload = {
        method: "POST",
        body: JSON.stringify(requestPayload),
@@ -104,12 +186,23 @@ export class ChatGLMApi implements LLMApi {
        headers: getHeaders(),
      };

-      // make a fetch request
      const requestTimeoutId = setTimeout(
        () => controller.abort(),
        REQUEST_TIMEOUT_MS,
      );

+      if (modelType === "image" || modelType === "video") {
+        const res = await fetch(path, chatPayload);
+        clearTimeout(requestTimeoutId);
+
+        const resJson = await res.json();
+        console.log(`[Response] glm ${modelType}:`, resJson);
+        const message = this.parseResponse(modelType, resJson);
+        options.onFinish(message, res);
+        return;
+      }
+
+      const shouldStream = !!options.config.stream;
      if (shouldStream) {
        const [tools, funcs] = usePluginStore
          .getState()
@@ -117,7 +210,7 @@ export class ChatGLMApi implements LLMApi {
            useChatStore.getState().currentSession().mask?.plugin || [],
          );
        return stream(
-          chatPath,
+          path,
          requestPayload,
          getHeaders(),
          tools as any,
@@ -125,7 +218,6 @@ export class ChatGLMApi implements LLMApi {
          controller,
          // parseSSE
          (text: string, runTools: ChatMessageTool[]) => {
-            // console.log("parseSSE", text, runTools);
            const json = JSON.parse(text);
            const choices = json.choices as Array<{
              delta: {
@@ -154,7 +246,7 @@ export class ChatGLMApi implements LLMApi {
            }
            return choices[0]?.delta?.content;
          },
-          // processToolMessage, include tool_calls message and tool call results
+          // processToolMessage
          (
            requestPayload: RequestPayload,
            toolCallMessage: any,
@@ -172,7 +264,7 @@ export class ChatGLMApi implements LLMApi {
          options,
        );
      } else {
-        const res = await fetch(chatPath, chatPayload);
+        const res = await fetch(path, chatPayload);
        clearTimeout(requestTimeoutId);

        const resJson = await res.json();
@@ -184,6 +276,7 @@ export class ChatGLMApi implements LLMApi {
      options.onError?.(e as Error);
    }
  }
+
  async usage() {
    return {
      used: 0,
--- a/app/client/platforms/google.ts
+++ b/app/client/platforms/google.ts
@@ -29,7 +29,7 @@ import { RequestPayload } from "./openai";
 import { fetch } from "@/app/utils/stream";

 export class GeminiProApi implements LLMApi {
-  path(path: string): string {
+  path(path: string, shouldStream = false): string {
    const accessStore = useAccessStore.getState();

    let baseUrl = "";
@@ -51,8 +51,10 @@ export class GeminiProApi implements LLMApi {
    console.log("[Proxy Endpoint] ", baseUrl, path);

    let chatPath = [baseUrl, path].join("/");
+    if (shouldStream) {
+      chatPath += chatPath.includes("?") ? "&alt=sse" : "?alt=sse";
+    }

-    chatPath += chatPath.includes("?") ? "&alt=sse" : "?alt=sse";
    return chatPath;
  }
  extractMessage(res: any) {
@@ -60,6 +62,7 @@ export class GeminiProApi implements LLMApi {

    return (
      res?.candidates?.at(0)?.content?.parts.at(0)?.text ||
+      res?.at(0)?.candidates?.at(0)?.content?.parts.at(0)?.text ||
      res?.error?.message ||
      ""
    );
@@ -166,7 +169,10 @@ export class GeminiProApi implements LLMApi {
    options.onController?.(controller);
    try {
      // https://github.com/google-gemini/cookbook/blob/main/quickstarts/rest/Streaming_REST.ipynb
-      const chatPath = this.path(Google.ChatPath(modelConfig.model));
+      const chatPath = this.path(
+        Google.ChatPath(modelConfig.model),
+        shouldStream,
+      );

      const chatPayload = {
        method: "POST",
--- a/app/client/platforms/openai.ts
+++ b/app/client/platforms/openai.ts
@@ -24,7 +24,7 @@ import {
  stream,
 } from "@/app/utils/chat";
 import { cloudflareAIGatewayUrl } from "@/app/utils/cloudflare";
-import { DalleSize, DalleQuality, DalleStyle } from "@/app/typing";
+import { ModelSize, DalleQuality, DalleStyle } from "@/app/typing";

 import {
  ChatOptions,
@@ -73,7 +73,7 @@ export interface DalleRequestPayload {
  prompt: string;
  response_format: "url" | "b64_json";
  n: number;
-  size: DalleSize;
+  size: ModelSize;
  quality: DalleQuality;
  style: DalleStyle;
 }
@@ -224,7 +224,7 @@ export class ChatGPTApi implements LLMApi {
      // O1 not support image, tools (plugin in ChatGPTNextWeb) and system, stream, logprobs, temperature, top_p, n, presence_penalty, frequency_penalty yet.
      requestPayload = {
        messages,
-        stream: !isO1 ? options.config.stream : false,
+        stream: options.config.stream,
        model: modelConfig.model,
        temperature: !isO1 ? modelConfig.temperature : 1,
        presence_penalty: !isO1 ? modelConfig.presence_penalty : 0,
@@ -247,7 +247,7 @@ export class ChatGPTApi implements LLMApi {

    console.log("[Request] openai payload: ", requestPayload);

-    const shouldStream = !isDalle3 && !!options.config.stream && !isO1;
+    const shouldStream = !isDalle3 && !!options.config.stream;
    const controller = new AbortController();
    options.onController?.(controller);

--- a/app/components/chat.tsx
+++ b/app/components/chat.tsx
@@ -72,6 +72,8 @@ import {
  isDalle3,
  showPlugins,
  safeLocalStorage,
+  getModelSizes,
+  supportsCustomSize,
 } from "../utils";

 import { uploadImage as uploadImageRemote } from "@/app/utils/chat";
@@ -79,7 +81,7 @@ import { uploadImage as uploadImageRemote } from "@/app/utils/chat";
 import dynamic from "next/dynamic";

 import { ChatControllerPool } from "../client/controller";
-import { DalleSize, DalleQuality, DalleStyle } from "../typing";
+import { DalleQuality, DalleStyle, ModelSize } from "../typing";
 import { Prompt, usePromptStore } from "../store/prompt";
 import Locale from "../locales";

@@ -519,10 +521,11 @@ export function ChatActions(props: {
  const [showSizeSelector, setShowSizeSelector] = useState(false);
  const [showQualitySelector, setShowQualitySelector] = useState(false);
  const [showStyleSelector, setShowStyleSelector] = useState(false);
-  const dalle3Sizes: DalleSize[] = ["1024x1024", "1792x1024", "1024x1792"];
+  const modelSizes = getModelSizes(currentModel);
  const dalle3Qualitys: DalleQuality[] = ["standard", "hd"];
  const dalle3Styles: DalleStyle[] = ["vivid", "natural"];
-  const currentSize = session.mask.modelConfig?.size ?? "1024x1024";
+  const currentSize =
+    session.mask.modelConfig?.size ?? ("1024x1024" as ModelSize);
  const currentQuality = session.mask.modelConfig?.quality ?? "standard";
  const currentStyle = session.mask.modelConfig?.style ?? "vivid";

@@ -673,7 +676,7 @@ export function ChatActions(props: {
          />
        )}

-        {isDalle3(currentModel) && (
+        {supportsCustomSize(currentModel) && (
          <ChatAction
            onClick={() => setShowSizeSelector(true)}
            text={currentSize}
@@ -684,7 +687,7 @@ export function ChatActions(props: {
        {showSizeSelector && (
          <Selector
            defaultSelectedValue={currentSize}
-            items={dalle3Sizes.map((m) => ({
+            items={modelSizes.map((m) => ({
              title: m,
              value: m,
            }))}
@@ -960,9 +963,24 @@ function _Chat() {
          (scrollRef.current.scrollTop + scrollRef.current.clientHeight),
      ) <= 1
    : false;
+  const isAttachWithTop = useMemo(() => {
+    const lastMessage = scrollRef.current?.lastElementChild as HTMLElement;
+    // if scrolllRef is not ready or no message, return false
+    if (!scrollRef?.current || !lastMessage) return false;
+    const topDistance =
+      lastMessage!.getBoundingClientRect().top -
+      scrollRef.current.getBoundingClientRect().top;
+    // leave some space for user question
+    return topDistance < 100;
+  }, [scrollRef?.current?.scrollHeight]);
+
+  const isTyping = userInput !== "";
+
+  // if user is typing, should auto scroll to bottom
+  // if user is not typing, should auto scroll to bottom only if already at bottom
  const { setAutoScroll, scrollDomToBottom } = useScrollToBottom(
    scrollRef,
-    isScrolledToBottom,
+    (isScrolledToBottom || isAttachWithTop) && !isTyping,
  );
  const [hitBottom, setHitBottom] = useState(true);
  const isMobileScreen = useMobileScreen();
@@ -2071,6 +2089,6 @@ function _Chat() {

 export function Chat() {
  const chatStore = useChatStore();
-  const sessionIndex = chatStore.currentSessionIndex;
-  return <_Chat key={sessionIndex}></_Chat>;
+  const session = chatStore.currentSession();
+  return <_Chat key={session.id}></_Chat>;
 }
--- a/app/components/emoji.tsx
+++ b/app/components/emoji.tsx
@@ -37,7 +37,8 @@ export function Avatar(props: { model?: ModelType; avatar?: string }) {
    return (
      <div className="no-dark">
        {props.model?.startsWith("gpt-4") ||
-        props.model?.startsWith("chatgpt-4o") ? (
+        props.model?.startsWith("chatgpt-4o") ||
+        props.model?.startsWith("o1") ? (
          <BlackBotIcon className="user-avatar" />
        ) : (
          <BotIcon className="user-avatar" />
--- a/app/components/markdown.tsx
+++ b/app/components/markdown.tsx
@@ -90,7 +90,11 @@ export function PreCode(props: { children: any }) {
    const refText = ref.current.querySelector("code")?.innerText;
    if (htmlDom) {
      setHtmlCode((htmlDom as HTMLElement).innerText);
-    } else if (refText?.startsWith("<!DOCTYPE")) {
+    } else if (
+      refText?.startsWith("<!DOCTYPE") ||
+      refText?.startsWith("<svg") ||
+      refText?.startsWith("<?xml")
+    ) {
      setHtmlCode(refText);
    }
  }, 600);
@@ -244,6 +248,10 @@ function escapeBrackets(text: string) {

 function tryWrapHtmlCode(text: string) {
  // try add wrap html code (fixed: html codeblock include 2 newline)
+  // ignore embed codeblock
+  if (text.includes("```")) {
+    return text;
+  }
  return text
    .replace(
      /([`]*?)(\w*?)([\n\r]*?)(<!DOCTYPE html>)/g,
--- a/app/components/settings.tsx
+++ b/app/components/settings.tsx
@@ -1771,9 +1771,11 @@ export function Settings() {
          <ListItem
            title={Locale.Settings.Access.CustomModel.Title}
            subTitle={Locale.Settings.Access.CustomModel.SubTitle}
+            vertical={true}
          >
            <input
              aria-label={Locale.Settings.Access.CustomModel.Title}
+              style={{ width: "100%", maxWidth: "unset", textAlign: "left" }}
              type="text"
              value={config.customModels}
              placeholder="model1,model2,model3"
--- a/app/config/build.ts
+++ b/app/config/build.ts
@@ -40,6 +40,7 @@ export const getBuildConfig = () => {
    buildMode,
    isApp,
    template: process.env.DEFAULT_INPUT_TEMPLATE ?? DEFAULT_INPUT_TEMPLATE,
+    visionModels: process.env.VISION_MODELS || "",
  };
 };

--- a/app/config/server.ts
+++ b/app/config/server.ts
@@ -129,14 +129,15 @@ export const getServerSideConfig = () => {
    if (customModels) customModels += ",";
    customModels += DEFAULT_MODELS.filter(
      (m) =>
-        (m.name.startsWith("gpt-4") || m.name.startsWith("chatgpt-4o")) &&
+        (m.name.startsWith("gpt-4") || m.name.startsWith("chatgpt-4o") || m.name.startsWith("o1")) &&
        !m.name.startsWith("gpt-4o-mini"),
    )
      .map((m) => "-" + m.name)
      .join(",");
    if (
      (defaultModel.startsWith("gpt-4") ||
-        defaultModel.startsWith("chatgpt-4o")) &&
+        defaultModel.startsWith("chatgpt-4o") ||
+        defaultModel.startsWith("o1")) &&
      !defaultModel.startsWith("gpt-4o-mini")
    )
      defaultModel = "";
--- a/app/constant.ts
+++ b/app/constant.ts
@@ -233,6 +233,8 @@ export const XAI = {
 export const ChatGLM = {
  ExampleEndpoint: CHATGLM_BASE_URL,
  ChatPath: "api/paas/v4/chat/completions",
+  ImagePath: "api/paas/v4/images/generations",
+  VideoPath: "api/paas/v4/videos/generations",
 };

 export const DEFAULT_INPUT_TEMPLATE = `{{input}}`; // input / time / model / lang
@@ -264,6 +266,7 @@ export const KnowledgeCutOffDate: Record<string, string> = {
  "gpt-4o": "2023-10",
  "gpt-4o-2024-05-13": "2023-10",
  "gpt-4o-2024-08-06": "2023-10",
+  "gpt-4o-2024-11-20": "2023-10",
  "chatgpt-4o-latest": "2023-10",
  "gpt-4o-mini": "2023-10",
  "gpt-4o-mini-2024-07-18": "2023-10",
@@ -290,6 +293,22 @@ export const DEFAULT_TTS_VOICES = [
  "shimmer",
 ];

+export const VISION_MODEL_REGEXES = [
+  /vision/,
+  /gpt-4o/,
+  /claude-3/,
+  /gemini-1\.5/,
+  /gemini-exp/,
+  /gemini-2\.0/,
+  /learnlm/,
+  /qwen-vl/,
+  /qwen2-vl/,
+  /gpt-4-turbo(?!.*preview)/, // Matches "gpt-4-turbo" but not "gpt-4-turbo-preview"
+  /^dall-e-3$/, // Matches exactly "dall-e-3"
+];
+
+export const EXCLUDE_VISION_MODEL_REGEXES = [/claude-3-5-haiku-20241022/];
+
 const openaiModels = [
  "gpt-3.5-turbo",
  "gpt-3.5-turbo-1106",
@@ -303,6 +322,7 @@ const openaiModels = [
  "gpt-4o",
  "gpt-4o-2024-05-13",
  "gpt-4o-2024-08-06",
+  "gpt-4o-2024-11-20",
  "chatgpt-4o-latest",
  "gpt-4o-mini",
  "gpt-4o-mini-2024-07-18",
@@ -315,10 +335,23 @@ const openaiModels = [
 ];

 const googleModels = [
-  "gemini-1.0-pro",
+  "gemini-1.0-pro", // Deprecated on 2/15/2025
  "gemini-1.5-pro-latest",
+  "gemini-1.5-pro",
+  "gemini-1.5-pro-002",
+  "gemini-1.5-pro-exp-0827",
  "gemini-1.5-flash-latest",
-  "gemini-pro-vision",
+  "gemini-1.5-flash-8b-latest",
+  "gemini-1.5-flash",
+  "gemini-1.5-flash-8b",
+  "gemini-1.5-flash-002",
+  "gemini-1.5-flash-exp-0827",
+  "learnlm-1.5-pro-experimental",
+  "gemini-exp-1114",
+  "gemini-exp-1121",
+  "gemini-exp-1206",
+  "gemini-2.0-flash-exp",
+  "gemini-2.0-flash-thinking-exp-1219",
 ];

 const anthropicModels = [
@@ -400,6 +433,15 @@ const chatglmModels = [
  "glm-4-long",
  "glm-4-flashx",
  "glm-4-flash",
+  "glm-4v-plus",
+  "glm-4v",
+  "glm-4v-flash", // free
+  "cogview-3-plus",
+  "cogview-3",
+  "cogview-3-flash", // free
+  // 目前无法适配轮询任务
+  //   "cogvideox",
+  //   "cogvideox-flash", // free
 ];

 let seq = 1000; // 内置的模型序号生成器从1000开始
--- a/app/masks/cn.ts
+++ b/app/masks/cn.ts
@@ -3,7 +3,7 @@ import { BuiltinMask } from "./typing";
 export const CN_MASKS: BuiltinMask[] = [
  {
    avatar: "1f5bc-fe0f",
-    name: "以文搜图",
+    name: "AI文生图",
    context: [
      {
        id: "text-to-pic-0",
@@ -28,7 +28,7 @@ export const CN_MASKS: BuiltinMask[] = [
        id: "text-to-pic-3",
        role: "system",
        content:
-          "助手善于判断用户意图，当确定需要提供图片时，助手会变得沉默寡言，只使用以下格式输出markdown图片：![描述](https://image.pollinations.ai/prompt/描述)，因为这个语法可以自动按照提示生成并渲染图片。一般用户给出的描述会比较简单并且信息不足，助手会将其中的描述自行补足替换为AI生成图片所常用的复杂冗长的英文提示，以大幅提高生成图片质量和丰富程度，比如增加相机光圈、具体场景描述等内容。助手会避免用代码块或原始块包围markdown标记，因为那样只会渲染出代码块或原始块而不是图片。",
+          "助手善于判断用户意图，当确定需要提供图片时，助手会变得沉默寡言，只使用以下格式输出markdown图片：![description](https://image.pollinations.ai/prompt/description?nologo=true)，因为这个语法可以自动按照提示生成并渲染图片。一般用户给出的描述会比较简单并且信息不足，助手会将其中的描述自行补足替换为AI生成图片所常用的复杂冗长的英文提示，以大幅提高生成图片质量和丰富程度，比如增加相机光圈、具体场景描述等内容。助手会避免用代码块或原始块包围markdown标记，因为那样只会渲染出代码块或原始块而不是图片。url中的空格等符号需要转义。",
        date: "",
      },
    ],
--- a/app/store/config.ts
+++ b/app/store/config.ts
@@ -1,5 +1,5 @@
 import { LLMModel } from "../client/api";
-import { DalleSize, DalleQuality, DalleStyle } from "../typing";
+import { DalleQuality, DalleStyle, ModelSize } from "../typing";
 import { getClientConfig } from "../config/client";
 import {
  DEFAULT_INPUT_TEMPLATE,
@@ -78,7 +78,7 @@ export const DEFAULT_CONFIG = {
    compressProviderName: "",
    enableInjectSystemPrompts: true,
    template: config?.template ?? DEFAULT_INPUT_TEMPLATE,
-    size: "1024x1024" as DalleSize,
+    size: "1024x1024" as ModelSize,
    quality: "standard" as DalleQuality,
    style: "vivid" as DalleStyle,
  },
--- a/app/typing.ts
+++ b/app/typing.ts
@@ -11,3 +11,14 @@ export interface RequestMessage {
 export type DalleSize = "1024x1024" | "1792x1024" | "1024x1792";
 export type DalleQuality = "standard" | "hd";
 export type DalleStyle = "vivid" | "natural";
+
+export type ModelSize =
+  | "1024x1024"
+  | "1792x1024"
+  | "1024x1792"
+  | "768x1344"
+  | "864x1152"
+  | "1344x768"
+  | "1152x864"
+  | "1440x720"
+  | "720x1440";
--- a/app/utils.ts
+++ b/app/utils.ts
@@ -5,6 +5,9 @@ import { RequestMessage } from "./client/api";
 import { ServiceProvider } from "./constant";
 // import { fetch as tauriFetch, ResponseType } from "@tauri-apps/api/http";
 import { fetch as tauriStreamFetch } from "./utils/stream";
+import { VISION_MODEL_REGEXES, EXCLUDE_VISION_MODEL_REGEXES } from "./constant";
+import { getClientConfig } from "./config/client";
+import { ModelSize } from "./typing";

 export function trimTopic(topic: string) {
  // Fix an issue where double quotes still show in the Indonesian language
@@ -252,25 +255,16 @@ export function getMessageImages(message: RequestMessage): string[] {
 }

 export function isVisionModel(model: string) {
-  // Note: This is a better way using the TypeScript feature instead of `&&` or `||` (ts v5.5.0-dev.20240314 I've been using)
-
-  const excludeKeywords = ["claude-3-5-haiku-20241022"];
-  const visionKeywords = [
-    "vision",
-    "claude-3",
-    "gemini-1.5-pro",
-    "gemini-1.5-flash",
-    "gpt-4o",
-    "gpt-4o-mini",
-  ];
-  const isGpt4Turbo =
-    model.includes("gpt-4-turbo") && !model.includes("preview");
-
+  const clientConfig = getClientConfig();
+  const envVisionModels = clientConfig?.visionModels
+    ?.split(",")
+    .map((m) => m.trim());
+  if (envVisionModels?.includes(model)) {
+    return true;
+  }
  return (
-    !excludeKeywords.some((keyword) => model.includes(keyword)) &&
-    (visionKeywords.some((keyword) => model.includes(keyword)) ||
-      isGpt4Turbo ||
-      isDalle3(model))
+    !EXCLUDE_VISION_MODEL_REGEXES.some((regex) => regex.test(model)) &&
+    VISION_MODEL_REGEXES.some((regex) => regex.test(model))
  );
 }

@@ -278,6 +272,28 @@ export function isDalle3(model: string) {
  return "dall-e-3" === model;
 }

+export function getModelSizes(model: string): ModelSize[] {
+  if (isDalle3(model)) {
+    return ["1024x1024", "1792x1024", "1024x1792"];
+  }
+  if (model.toLowerCase().includes("cogview")) {
+    return [
+      "1024x1024",
+      "768x1344",
+      "864x1152",
+      "1344x768",
+      "1152x864",
+      "1440x720",
+      "720x1440",
+    ];
+  }
+  return [];
+}
+
+export function supportsCustomSize(model: string): boolean {
+  return getModelSizes(model).length > 0;
+}
+
 export function showPlugins(provider: ServiceProvider, model: string) {
  if (
    provider == ServiceProvider.OpenAI ||
--- a/package.json
+++ b/package.json
@@ -59,8 +59,8 @@
    "@tauri-apps/api": "^1.6.0",
    "@tauri-apps/cli": "1.5.11",
    "@testing-library/dom": "^10.4.0",
-    "@testing-library/jest-dom": "^6.6.2",
-    "@testing-library/react": "^16.0.1",
+    "@testing-library/jest-dom": "^6.6.3",
+    "@testing-library/react": "^16.1.0",
    "@types/jest": "^29.5.14",
    "@types/js-yaml": "4.0.9",
    "@types/lodash-es": "^4.17.12",
--- a/test/vision-model-checker.test.ts
+++ b/test/vision-model-checker.test.ts
@@ -0,0 +1,67 @@
+import { isVisionModel } from "../app/utils";
+
+describe("isVisionModel", () => {
+  const originalEnv = process.env;
+
+  beforeEach(() => {
+    jest.resetModules();
+    process.env = { ...originalEnv };
+  });
+
+  afterEach(() => {
+    process.env = originalEnv;
+  });
+
+  test("should identify vision models using regex patterns", () => {
+    const visionModels = [
+      "gpt-4-vision",
+      "claude-3-opus",
+      "gemini-1.5-pro",
+      "gemini-2.0",
+      "gemini-exp-vision",
+      "learnlm-vision",
+      "qwen-vl-max",
+      "qwen2-vl-max",
+      "gpt-4-turbo",
+      "dall-e-3",
+    ];
+
+    visionModels.forEach((model) => {
+      expect(isVisionModel(model)).toBe(true);
+    });
+  });
+
+  test("should exclude specific models", () => {
+    expect(isVisionModel("claude-3-5-haiku-20241022")).toBe(false);
+  });
+
+  test("should not identify non-vision models", () => {
+    const nonVisionModels = [
+      "gpt-3.5-turbo",
+      "gpt-4-turbo-preview",
+      "claude-2",
+      "regular-model",
+    ];
+
+    nonVisionModels.forEach((model) => {
+      expect(isVisionModel(model)).toBe(false);
+    });
+  });
+
+  test("should identify models from VISION_MODELS env var", () => {
+    process.env.VISION_MODELS = "custom-vision-model,another-vision-model";
+    
+    expect(isVisionModel("custom-vision-model")).toBe(true);
+    expect(isVisionModel("another-vision-model")).toBe(true);
+    expect(isVisionModel("unrelated-model")).toBe(false);
+  });
+
+  test("should handle empty or missing VISION_MODELS", () => {
+    process.env.VISION_MODELS = "";
+    expect(isVisionModel("unrelated-model")).toBe(false);
+
+    delete process.env.VISION_MODELS;
+    expect(isVisionModel("unrelated-model")).toBe(false);
+    expect(isVisionModel("gpt-4-vision")).toBe(true);
+  });
+});
--- a/yarn.lock
+++ b/yarn.lock
@@ -2114,10 +2114,10 @@
    lz-string "^1.5.0"
    pretty-format "^27.0.2"

-"@testing-library/jest-dom@^6.6.2":
-  version "6.6.2"
-  resolved "https://registry.yarnpkg.com/@testing-library/jest-dom/-/jest-dom-6.6.2.tgz#8186aa9a07263adef9cc5a59a4772db8c31f4a5b"
-  integrity sha512-P6GJD4yqc9jZLbe98j/EkyQDTPgqftohZF5FBkHY5BUERZmcf4HeO2k0XaefEg329ux2p21i1A1DmyQ1kKw2Jw==
+"@testing-library/jest-dom@^6.6.3":
+  version "6.6.3"
+  resolved "https://registry.yarnpkg.com/@testing-library/jest-dom/-/jest-dom-6.6.3.tgz#26ba906cf928c0f8172e182c6fe214eb4f9f2bd2"
+  integrity sha512-IteBhl4XqYNkM54f4ejhLRJiZNqcSCoXUOG2CPK7qbD322KjQozM4kHQOfkG2oln9b9HTYqs+Sae8vBATubxxA==
  dependencies:
    "@adobe/css-tools" "^4.4.0"
    aria-query "^5.0.0"
@@ -2127,10 +2127,10 @@
    lodash "^4.17.21"
    redent "^3.0.0"

-"@testing-library/react@^16.0.1":
-  version "16.0.1"
-  resolved "https://registry.yarnpkg.com/@testing-library/react/-/react-16.0.1.tgz#29c0ee878d672703f5e7579f239005e4e0faa875"
-  integrity sha512-dSmwJVtJXmku+iocRhWOUFbrERC76TX2Mnf0ATODz8brzAZrMBbzLwQixlBSanZxR6LddK3eiwpSFZgDET1URg==
+"@testing-library/react@^16.1.0":
+  version "16.1.0"
+  resolved "https://registry.yarnpkg.com/@testing-library/react/-/react-16.1.0.tgz#aa0c61398bac82eaf89776967e97de41ac742d71"
+  integrity sha512-Q2ToPvg0KsVL0ohND9A3zLJWcOXXcO8IDu3fj11KhNt0UlCWyFyvnCIBkd12tidB2lkiVRG8VFqdhcqhqnAQtg==
  dependencies:
    "@babel/runtime" "^7.12.5"
Author	SHA1	Message	Date
Dogtiti	ba8e2414c6	fix: type error	2024-12-27 22:35:40 +08:00
Dogtiti	fe7f726c4b	fix: size	2024-12-27 21:57:23 +08:00
Dogtiti	6b914b7ced	feature: support glm Cogview	2024-12-27 21:52:22 +08:00
Dogtiti	175c52b13b	chore: update vercel url	2024-12-27 20:39:08 +08:00
Dogtiti	0db21cf836	Merge pull request #18 from Dogtiti/merge Merge	2024-12-27 20:30:44 +08:00
Dogtiti	5b1a759f86	Update	2024-12-27 20:29:44 +08:00
RiverRay	0c3d4462ca	Merge pull request #5976 from ChatGPTNextWeb/Leizhenpeng-patch-1 Update README.md	2024-12-23 22:47:59 +08:00
RiverRay	3c859fc29f	Update README.md	2024-12-23 22:47:16 +08:00
Dogtiti	1d15666713	Merge pull request #5919 from Yiming3/feature/flexible-visual-model feat: runtime configuration of vision-capable models	2024-12-22 10:37:57 +08:00
Yiming Zhang	a127ae1fb4	docs: add VISION_MODELS section to README files	2024-12-21 13:12:41 -05:00
Yiming Zhang	ea1329f73e	fix: add optional chaining to prevent errors when accessing visionModels	2024-12-21 04:07:58 -05:00
Yiming Zhang	149d732cb7	Merge remote-tracking branch 'upstream/main' into feature/flexible-visual-model	2024-12-21 03:53:05 -05:00
Yiming Zhang	210b29bfbe	refactor: remove NEXT_PUBLIC_ prefix from VISION_MODELS env var	2024-12-21 03:51:54 -05:00
Dogtiti	acc2e97aab	Merge pull request #5959 from dupl/gemini add gemini-exp-1206, gemini-2.0-flash-thinking-exp-1219	2024-12-21 16:30:09 +08:00
dupl	93ac0e5017	Reorganized the Gemini model	2024-12-21 15:26:33 +08:00
Yiming Zhang	ed8c3580c8	test: add unit tests for isVisionModel utility function	2024-12-20 19:07:00 -05:00
dupl	0a056a7c5c	add gemini-exp-1206, gemini-2.0-flash-thinking-exp-1219	2024-12-21 08:00:37 +08:00
Yiming Zhang	74c4711cdd	Merge remote-tracking branch 'upstream/main' into feature/flexible-visual-model	2024-12-20 18:34:07 -05:00
Dogtiti	eceec092cf	Merge pull request #5932 from fengzai6/update-google-models Update google models to add gemini-2.0	2024-12-21 00:43:02 +08:00
Dogtiti	42743410a8	Merge pull request #5940 from ChatGPTNextWeb/dependabot/npm_and_yarn/testing-library/react-16.1.0 chore(deps-dev): bump @testing-library/react from 16.0.1 to 16.1.0	2024-12-21 00:41:45 +08:00
Dogtiti	0f04756d4c	Merge pull request #5936 from InitialXKO/main 面具“以文搜图”改成“AI文生图”，微调提示让图片生成更稳定无水印	2024-12-21 00:40:45 +08:00
Dogtiti	ab158bf042	Merge pull request #9 from Dogtiti/Dogtiti-patch-1 Update	2024-12-20 22:18:49 +08:00
Dogtiti	14e3b409cc	Update	2024-12-20 22:17:12 +08:00
dependabot[bot]	acdded8161	chore(deps-dev): bump @testing-library/react from 16.0.1 to 16.1.0 Bumps [@testing-library/react](https://github.com/testing-library/react-testing-library) from 16.0.1 to 16.1.0. - [Release notes](https://github.com/testing-library/react-testing-library/releases) - [Changelog](https://github.com/testing-library/react-testing-library/blob/main/CHANGELOG.md) - [Commits](https://github.com/testing-library/react-testing-library/compare/v16.0.1...v16.1.0) --- updated-dependencies: - dependency-name: "@testing-library/react" dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2024-12-16 10:57:34 +00:00
InitialXKO	e939ce5a02	面具“以文搜图”改成“AI文生图”，微调提示让图片生成更稳定无水印	2024-12-13 22:29:14 +08:00
Nacho.L	46a0b100f7	Update versionKeywords	2024-12-13 08:29:43 +08:00
Nacho.L	e27e8fb0e1	Update google models	2024-12-13 07:22:16 +08:00
Yiming Zhang	a433d1606c	feat: use regex patterns for vision models and allow adding capabilities to models through env var NEXT_PUBLIC_VISION_MODELS.	2024-12-10 00:22:45 -05:00
Dogtiti	83cea3a90d	Merge pull request #5879 from frostime/textline-custom-model 🎨 style(setting): Place custom-model's input a separated row.	2024-11-28 12:02:42 +08:00
frostime	759a09a76c	🎨 style(setting): Place custom-model's input a seperated row.	2024-11-27 13:11:18 +08:00
Dogtiti	2623a92763	Merge pull request #5850 from code-october/fix-o1 Fix o1	2024-11-25 12:31:36 +08:00
Dogtiti	3932c594c7	Merge pull request #5861 from code-october/update-model update new model for gpt-4o and gemini-exp	2024-11-22 20:59:30 +08:00
code-october	b7acb89096	update new model for gpt-4o and gemini-exp	2024-11-22 09:48:50 +00:00
code-october	ef24d3e633	use stream when request o1	2024-11-21 03:46:10 +00:00
code-october	23350c842b	fix o1 in disableGPT4	2024-11-21 03:45:07 +00:00
Dogtiti	a2adfbbd32	Merge pull request #5821 from Sherlocksuper/scroll feat: support more user-friendly scrolling	2024-11-16 15:24:46 +08:00
Lloyd Zhou	f22cec1eb4	Merge pull request #5827 from ConnectAI-E/fix/markdown-embed-codeblock fix: 代码块嵌入小代码块时渲染错误	2024-11-15 16:03:27 +08:00
opchips	e56216549e	fix: 代码块嵌入小代码块时渲染错误	2024-11-15 11:56:26 +08:00
Sherlock	19facc7c85	feat: support mort user-friendly scrolling	2024-11-14 21:31:45 +08:00
Lloyd Zhou	b08ce5630c	Merge pull request #5819 from ConnectAI-E/fix-gemini-summary Fix gemini summary	2024-11-13 15:17:44 +08:00
DDMeaqua	b41c012d27	chore: shouldStream	2024-11-13 15:12:46 +08:00
Lloyd Zhou	a392daab71	Merge pull request #5816 from ConnectAI-E/feature/artifacts-svg artifacts support svg	2024-11-13 14:58:33 +08:00
DDMeaqua	0628ddfc6f	chore: update	2024-11-13 14:27:41 +08:00
DDMeaqua	7eda14f138	fix: [#5308 ] gemini对话总结	2024-11-13 14:24:44 +08:00
opchips	9a86c42c95	update	2024-11-12 16:33:55 +08:00
Lloyd Zhou	819d249a09	Merge pull request #5815 from LovelyGuYiMeng/main 更新视觉模型匹配关键词	2024-11-12 15:04:11 +08:00
LovelyGuYiMeng	8d66fedb1f	Update visionKeywords	2024-11-12 14:28:11 +08:00
Lloyd Zhou	7cf89b53ce	Merge pull request #5812 from ConnectAI-E/fix/rerender-chat fix: use current session id to trigger rerender	2024-11-12 13:49:51 +08:00
Dogtiti	459c373f13	Merge pull request #5807 from ChatGPTNextWeb/dependabot/npm_and_yarn/testing-library/jest-dom-6.6.3 chore(deps-dev): bump @testing-library/jest-dom from 6.6.2 to 6.6.3	2024-11-11 20:59:56 +08:00
Dogtiti	1d14a991ee	fix: use current session id to trigger rerender	2024-11-11 20:30:59 +08:00
dependabot[bot]	05ef5adfa7	chore(deps-dev): bump @testing-library/jest-dom from 6.6.2 to 6.6.3 Bumps [@testing-library/jest-dom](https://github.com/testing-library/jest-dom) from 6.6.2 to 6.6.3. - [Release notes](https://github.com/testing-library/jest-dom/releases) - [Changelog](https://github.com/testing-library/jest-dom/blob/main/CHANGELOG.md) - [Commits](https://github.com/testing-library/jest-dom/compare/v6.6.2...v6.6.3) --- updated-dependencies: - dependency-name: "@testing-library/jest-dom" dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2024-11-11 10:53:00 +00:00