Quellcode durchsuchen

视频号和百家号的作品同步

Ethanfly vor 2 Tagen
Ursprung
Commit
28773b6878

+ 6 - 0
client/src/components.d.ts

@@ -16,8 +16,12 @@ declare module 'vue' {
     ElButton: typeof import('element-plus/es')['ElButton']
     ElButtonGroup: typeof import('element-plus/es')['ElButtonGroup']
     ElCheckbox: typeof import('element-plus/es')['ElCheckbox']
+    ElCheckboxGroup: typeof import('element-plus/es')['ElCheckboxGroup']
     ElConfigProvider: typeof import('element-plus/es')['ElConfigProvider']
     ElContainer: typeof import('element-plus/es')['ElContainer']
+    ElDatePicker: typeof import('element-plus/es')['ElDatePicker']
+    ElDescriptions: typeof import('element-plus/es')['ElDescriptions']
+    ElDescriptionsItem: typeof import('element-plus/es')['ElDescriptionsItem']
     ElDialog: typeof import('element-plus/es')['ElDialog']
     ElDivider: typeof import('element-plus/es')['ElDivider']
     ElDrawer: typeof import('element-plus/es')['ElDrawer']
@@ -45,6 +49,8 @@ declare module 'vue' {
     ElTabPane: typeof import('element-plus/es')['ElTabPane']
     ElTabs: typeof import('element-plus/es')['ElTabs']
     ElTag: typeof import('element-plus/es')['ElTag']
+    ElText: typeof import('element-plus/es')['ElText']
+    ElUpload: typeof import('element-plus/es')['ElUpload']
     Icons: typeof import('./components/icons/index.vue')['default']
     RouterLink: typeof import('vue-router')['RouterLink']
     RouterView: typeof import('vue-router')['RouterView']

+ 66 - 0
server/python/WECHAT_VIDEO_SYNC_WORKS.md

@@ -0,0 +1,66 @@
+# 视频号(weixin_video)同步作品逻辑说明
+
+## 一、整体流程
+
+```
+作品管理「同步作品」 / 任务队列 sync_works
+    → WorkService.syncWorks(userId, accountId?, platform?)
+    → 按账号筛选后,对每个账号调用 HeadlessBrowserService.fetchAccountInfo(platform, cookies, ...)
+    → 若平台在 supportedPlatforms 内(含 weixin_video),优先用 Python 拉作品
+    → fetchWorksViaPython('weixin_video', cookies, onProgress)
+    → 请求 Python 服务 POST /works,platform 映射为 'weixin'
+    → Python platforms/weixin.py WeixinPublisher.get_works(cookies, page, page_size)
+    → 返回 works + total + has_more,Node 据此分页循环
+    → 拿到全量 works 后,再调 fetchAccountInfoWithPlaywright 补账号信息(头像、粉丝等)
+    → WorkService 用返回的 works 与 DB 做比对:新增/更新/删除本地作品
+```
+
+## 二、Node 端(HeadlessBrowserService)
+
+- **平台映射**:`weixin_video` → 请求 Python 时用 `platform: 'weixin'`(`pythonPlatform = platform === 'weixin_video' ? 'weixin' : platform`)。
+- **分页方式**:视频号**不用**游标分页,用**页码**:
+  - `useCursorPagination = platform === 'xiaohongshu' || platform === 'douyin'` → 视频号为 false。
+  - 因此 `pageParam = pageIndex`(0, 1, 2, ...),每次请求带 `page`、`page_size`。
+- **每页条数**:非小红书/非抖音统一 `pageSize = 50`。
+- **停止条件**:`!result.has_more || pageWorks.length === 0 || newCount === 0` 时结束循环。
+
+## 三、Python 端(platforms/weixin.py)
+
+- **入口**:`get_works(cookies, page=0, page_size=20)`,由 `run_get_works` 调用(app.py 的 /works 路由传 page、page_size)。
+- **当前实现要点**:
+  1. 打开固定页面:`https://channels.weixin.qq.com/platform/post/list`,等待 `div.post-feed-item`。
+  2. 用 `self.page.locator('div.post-feed-item')` 取当前 DOM 下所有作品项,`item_count = await post_items.count()`。
+  3. 只遍历前 `min(item_count, page_size)` 条,解析封面、标题、时间、播放/点赞/评论/分享/收藏等。
+  4. **未使用参数 `page`**:没有根据 page 做滚动、点击「加载更多」或请求下一页接口,每次请求都是同一页 DOM。
+  5. 返回值:
+     - `total = len(works)`(当前批数量,非平台总作品数);
+     - `has_more = item_count > page_size`(仅表示当前屏 DOM 条数是否大于 page_size)。
+
+## 四、存在的问题
+
+1. **分页未实现**
+   - Node 会按 `has_more` 继续请求 page=1, 2, …,但 Python 每次都是同一页、同一批 DOM,第二页起通常会返回重复数据,Node 端 `newCount === 0` 后停止。
+   - 实际效果:**只能拿到首屏/首批作品**(约几十条),列表若为虚拟滚动,首屏 DOM 条数有限,总数会更少。
+
+2. **work_id 不稳定**
+   - 使用 `work_id = f"weixin_{i}_{hash(title)}_{hash(publish_time)}"`,同一作品在不同批次或重试中可能 i 不同,且 hash 可能碰撞;若页面有唯一 ID(如接口或 data 属性),应用真实 ID 更稳妥。
+
+3. **total / has_more 含义与 Node 预期不一致**
+   - Node 用 `declaredTotal`、`has_more` 决定是否继续请求;Python 的 `total` 只是本批条数,`has_more` 只反映当前屏是否多于 page_size,不能代表「平台是否还有更多作品」。
+
+4. **调试代码未清理**
+   - 存在 `print("1111111111111111")` 等调试输出,以及大段 DOM 打印,建议移除或改为可配置日志。
+
+## 五、改进方向建议
+
+1. **实现真实分页**
+   - 若视频号创作者后台有「加载更多」或滚动加载:在 `get_works` 里根据 `page` 做多次滚动或点击,再采集当前屏新增的 `div.post-feed-item`,并去重。
+   - 若有列表接口(如类似抖音 work_list):改为请求接口并解析 cursor/offset 分页,返回 `next_page` 供 Node 按游标请求(需同步改 Node 对 weixin_video 的分页策略)。
+2. **使用稳定作品 ID**
+   - 从 DOM 或接口中取作品唯一 ID(若有),作为 `work_id`,便于去重与和本地库一致对应。
+3. **清理调试并规范日志**
+   - 去掉无意义 print,DOM 打印改为 debug 级别或环境变量控制。
+
+---
+
+文档基于当前代码整理,若 weixin.py 或 HeadlessBrowserService 有改动,以实际代码为准。

BIN
server/python/platforms/__pycache__/baijiahao.cpython-311.pyc


BIN
server/python/platforms/__pycache__/douyin.cpython-311.pyc


BIN
server/python/platforms/__pycache__/weixin.cpython-311.pyc


+ 214 - 84
server/python/platforms/baijiahao.py

@@ -823,9 +823,14 @@ class BaijiahaoPublisher(BasePublisher):
     async def get_works(self, cookies: str, page: int = 0, page_size: int = 20) -> WorksResult:
         """
         获取百家号作品列表
-        使用直接 HTTP API 调用,不使用浏览器
+        优先使用内容管理页的接口(pcui/article/lists)。
+
+        说明:
+        - 该接口通常需要自定义请求头 token(JWT),仅靠 Cookie 可能会返回“未登录”
+        - 这里使用 Playwright 打开内容页,从 localStorage/sessionStorage/页面脚本中自动提取 token,
+          再在页面上下文中发起 fetch(携带 cookie + token),以提高成功率
         """
-        import aiohttp
+        import re
         
         print(f"\n{'='*60}")
         print(f"[{self.platform_name}] 获取作品列表 (使用 API)")
@@ -835,90 +840,213 @@ class BaijiahaoPublisher(BasePublisher):
         works: List[WorkItem] = []
         total = 0
         has_more = False
+        next_page = ""
         
         try:
-            # 解析 cookies
+            # 解析并设置 cookies(Playwright)
             cookie_list = self.parse_cookies(cookies)
-            cookie_dict = {c['name']: c['value'] for c in cookie_list}
-            
-            headers = {
-                'Accept': 'application/json, text/plain, */*',
-                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
-                # Cookie 由 session 管理
-                'Referer': 'https://baijiahao.baidu.com/builder/rc/content',
-                'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
-                'Accept-Encoding': 'gzip, deflate, br',
-                'Connection': 'keep-alive',
-                'Sec-Fetch-Dest': 'empty',
-                'Sec-Fetch-Mode': 'cors',
-                'Sec-Fetch-Site': 'same-origin',
-                'sec-ch-ua': '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',
-                'sec-ch-ua-mobile': '?0',
-                'sec-ch-ua-platform': '"Windows"'
-            }
-            
-            # 计算起始位置
-            start = page * page_size
-            
-            async with aiohttp.ClientSession(cookies=cookie_dict) as session:
-                print(f"[{self.platform_name}] 调用 article/lists API (start={start}, count={page_size})...")
-                
-                async with session.get(
-                    f'https://baijiahao.baidu.com/pcui/article/lists?start={start}&count={page_size}&article_type=video',
-                    headers=headers,
-                    timeout=aiohttp.ClientTimeout(total=30)
-                ) as response:
-                    api_result = await response.json()
-                
-                print(f"[{self.platform_name}] article/lists API 完整响应: {json.dumps(api_result, ensure_ascii=False)[:500]}")
-                print(f"[{self.platform_name}] API 响应: errno={api_result.get('errno')}")
-                
-                # 检查登录状态
-                if api_result.get('errno') != 0:
-                    error_msg = api_result.get('errmsg', '未知错误')
-                    errno = api_result.get('errno')
-                    print(f"[{self.platform_name}] API 返回错误: errno={errno}, msg={error_msg}")
-                    
-                    if errno == 110:
-                        raise Exception("Cookie 已过期,请重新登录")
-                    
-                    raise Exception(error_msg)
-                
-                # 解析作品列表
-                data = api_result.get('data', {})
-                article_list = data.get('article_list', [])
-                has_more = data.get('has_more', False)
-                total = data.get('total', 0)
-                
-                print(f"[{self.platform_name}] 获取到 {len(article_list)} 个作品,总数: {total}")
-                
-                for article in article_list:
-                    work_id = str(article.get('article_id', ''))
-                    if not work_id:
-                        continue
-                    
-                    # 处理封面图
-                    cover_url = ''
-                    cover_images = article.get('cover_images', [])
-                    if cover_images and len(cover_images) > 0:
-                        cover_url = cover_images[0]
-                        if cover_url and cover_url.startswith('//'):
-                            cover_url = 'https:' + cover_url
-                    
-                    works.append(WorkItem(
+            await self.init_browser()
+            await self.set_cookies(cookie_list)
+
+            if not self.page:
+                raise Exception("Page not initialized")
+
+            # 先打开内容管理页,确保本页 Referer/会话就绪
+            # Node 侧传 page=0,1,...;接口 currentPage 为 1,2,...
+            current_page = int(page) + 1
+            page_size = int(page_size)
+            content_url = (
+                "https://baijiahao.baidu.com/builder/rc/content"
+                f"?currentPage={current_page}&pageSize={page_size}"
+                "&search=&type=&collection=&startDate=&endDate="
+            )
+            await self.page.goto(content_url, wait_until="domcontentloaded", timeout=60000)
+            await asyncio.sleep(2)
+
+            # 1) 提取 token(JWT)
+            token = await self.page.evaluate(
+                """
+                () => {
+                  const isJwtLike = (v) => {
+                    if (!v || typeof v !== 'string') return false;
+                    const s = v.trim();
+                    if (s.length < 60) return false;
+                    const parts = s.split('.');
+                    if (parts.length !== 3) return false;
+                    return parts.every(p => /^[A-Za-z0-9_-]+$/.test(p) && p.length > 10);
+                  };
+
+                  const pickFromStorage = (storage) => {
+                    try {
+                      const keys = Object.keys(storage || {});
+                      for (const k of keys) {
+                        const v = storage.getItem(k);
+                        if (isJwtLike(v)) return v;
+                      }
+                    } catch {}
+                    return "";
+                  };
+
+                  // localStorage / sessionStorage
+                  let t = pickFromStorage(window.localStorage);
+                  if (t) return t;
+                  t = pickFromStorage(window.sessionStorage);
+                  if (t) return t;
+
+                  // meta 标签
+                  const meta = document.querySelector('meta[name="token"], meta[name="bjh-token"]');
+                  const metaToken = meta && meta.getAttribute('content');
+                  if (isJwtLike(metaToken)) return metaToken;
+
+                  // 简单从全局变量里找
+                  const candidates = [
+                    (window.__INITIAL_STATE__ && window.__INITIAL_STATE__.token) || "",
+                    (window.__PRELOADED_STATE__ && window.__PRELOADED_STATE__.token) || "",
+                    (window.__NUXT__ && window.__NUXT__.state && window.__NUXT__.state.token) || "",
+                  ];
+                  for (const c of candidates) {
+                    if (isJwtLike(c)) return c;
+                  }
+
+                  return "";
+                }
+                """
+            )
+
+            # 2) 若仍未取到 token,再从页面 HTML 兜底提取
+            if not token:
+                html = await self.page.content()
+                m = re.search(r'([A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,})', html)
+                if m:
+                    token = m.group(1)
+
+            if not token:
+                raise Exception("未能从页面提取 token(可能未登录或触发风控),请重新登录百家号账号后再试")
+
+            # 3) 调用接口(在页面上下文 fetch,自动携带 cookie)
+            api_url = (
+                "https://baijiahao.baidu.com/pcui/article/lists"
+                f"?currentPage={current_page}"
+                f"&pageSize={page_size}"
+                "&search=&type=&collection=&startDate=&endDate="
+                "&clearBeforeFetch=false"
+                "&dynamic=1"
+            )
+            resp = await self.page.evaluate(
+                """
+                async ({ url, token }) => {
+                  const r = await fetch(url, {
+                    method: 'GET',
+                    credentials: 'include',
+                    headers: {
+                      'accept': 'application/json, text/plain, */*',
+                      ...(token ? { token } : {}),
+                    },
+                  });
+                  const text = await r.text();
+                  return { ok: r.ok, status: r.status, text };
+                }
+                """,
+                {"url": api_url, "token": token},
+            )
+
+            if not resp or not resp.get("ok"):
+                status = resp.get("status") if isinstance(resp, dict) else "unknown"
+                raise Exception(f"百家号接口请求失败: HTTP {status}")
+
+            api_result = json.loads(resp.get("text") or "{}")
+            print(f"[{self.platform_name}] pcui/article/lists 响应: errno={api_result.get('errno')}, errmsg={api_result.get('errmsg')}")
+
+            if api_result.get("errno") != 0:
+                errno = api_result.get("errno")
+                errmsg = api_result.get("errmsg", "unknown error")
+                # 20040001 常见为“未登录”
+                if errno in (110, 20040001):
+                    raise Exception("百家号未登录或 Cookie/token 失效,请重新登录后再同步")
+                raise Exception(f"百家号接口错误: errno={errno}, errmsg={errmsg}")
+
+            data = api_result.get("data", {}) or {}
+            items = data.get("list", []) or []
+            page_info = data.get("page", {}) or {}
+            total = int(page_info.get("totalCount", 0) or 0)
+            total_page = int(page_info.get("totalPage", 0) or 0)
+            cur_page = int(page_info.get("currentPage", current_page) or current_page)
+            has_more = bool(total_page and cur_page < total_page)
+            next_page = cur_page + 1 if has_more else ""
+
+            print(f"[{self.platform_name}] 获取到 {len(items)} 个作品,总数: {total}, currentPage={cur_page}, totalPage={total_page}")
+
+            def _pick_cover(item: dict) -> str:
+                cover = item.get("crosswise_cover") or item.get("vertical_cover") or ""
+                if cover:
+                    return cover
+                raw = item.get("cover_images") or ""
+                try:
+                    # cover_images 可能是 JSON 字符串
+                    parsed = json.loads(raw) if isinstance(raw, str) else raw
+                    if isinstance(parsed, list) and parsed:
+                        first = parsed[0]
+                        if isinstance(first, dict):
+                            return first.get("src") or first.get("ori_src") or ""
+                        if isinstance(first, str):
+                            return first
+                except Exception:
+                    pass
+                return ""
+
+            def _pick_duration(item: dict) -> int:
+                for k in ("rmb_duration", "duration", "long"):
+                    try:
+                        v = int(item.get(k) or 0)
+                        if v > 0:
+                            return v
+                    except Exception:
+                        pass
+                # displaytype_exinfo 里可能有 ugcvideo.video_info.durationInSecond
+                ex = item.get("displaytype_exinfo") or ""
+                try:
+                    exj = json.loads(ex) if isinstance(ex, str) and ex else (ex if isinstance(ex, dict) else {})
+                    ugc = (exj.get("ugcvideo") or {}) if isinstance(exj, dict) else {}
+                    vi = ugc.get("video_info") or {}
+                    v = int(vi.get("durationInSecond") or ugc.get("long") or 0)
+                    return v if v > 0 else 0
+                except Exception:
+                    return 0
+
+            def _pick_status(item: dict) -> str:
+                qs = str(item.get("quality_status") or "").lower()
+                st = str(item.get("status") or "").lower()
+                if qs == "rejected" or "reject" in st:
+                    return "rejected"
+                if st in ("draft", "unpublish", "unpublished"):
+                    return "draft"
+                # 百家号常见 publish
+                return "published"
+
+            for item in items:
+                # 优先使用 nid(builder 预览链接使用这个)
+                work_id = str(item.get("nid") or item.get("feed_id") or item.get("article_id") or item.get("id") or "")
+                if not work_id:
+                    continue
+
+                works.append(
+                    WorkItem(
                         work_id=work_id,
-                        title=article.get('title', ''),
-                        cover_url=cover_url,
-                        duration=0,
-                        status='published',
-                        publish_time=article.get('publish_time', ''),
-                        play_count=int(article.get('read_count', 0)),
-                        like_count=int(article.get('like_count', 0)),
-                        comment_count=int(article.get('comment_count', 0)),
-                        share_count=int(article.get('share_count', 0)),
-                    ))
-                
-                print(f"[{self.platform_name}] ✓ 成功解析 {len(works)} 个作品")
+                        title=str(item.get("title") or ""),
+                        cover_url=_pick_cover(item),
+                        video_url=str(item.get("url") or ""),
+                        duration=_pick_duration(item),
+                        status=_pick_status(item),
+                        publish_time=str(item.get("publish_time") or item.get("publish_at") or item.get("created_at") or ""),
+                        play_count=int(item.get("read_amount") or 0),
+                        like_count=int(item.get("like_amount") or 0),
+                        comment_count=int(item.get("comment_amount") or 0),
+                        share_count=int(item.get("share_amount") or 0),
+                        collect_count=int(item.get("collection_amount") or 0),
+                    )
+                )
+
+            print(f"[{self.platform_name}] ✓ 成功解析 {len(works)} 个作品")
             
         except Exception as e:
             import traceback
@@ -926,7 +1054,8 @@ class BaijiahaoPublisher(BasePublisher):
             return WorksResult(
                 success=False,
                 platform=self.platform_name,
-                error=str(e)
+                error=str(e),
+                debug_info="baijiahao_get_works_failed"
             )
         
         return WorksResult(
@@ -934,7 +1063,8 @@ class BaijiahaoPublisher(BasePublisher):
             platform=self.platform_name,
             works=works,
             total=total,
-            has_more=has_more
+            has_more=has_more,
+            next_page=next_page
         )
     
     async def check_login_status(self, cookies: str) -> dict:

+ 187 - 103
server/python/platforms/weixin.py

@@ -5,6 +5,7 @@
 """
 
 import asyncio
+import json
 import os
 from datetime import datetime
 from typing import List
@@ -969,19 +970,97 @@ class WeixinPublisher(BasePublisher):
             status='need_action'
         )
 
+    async def _get_works_fallback_dom(self, page_size: int) -> tuple:
+        """API 失败时从当前页面 DOM 抓取作品列表(兼容新账号/不同入口)"""
+        works: List[WorkItem] = []
+        total = 0
+        has_more = False
+        try:
+            for selector in ["div.post-feed-item", "[class*='post-feed']", "[class*='feed-item']", "div[class*='post']"]:
+                try:
+                    await self.page.wait_for_selector(selector, timeout=8000)
+                    break
+                except Exception:
+                    continue
+            post_items = self.page.locator("div.post-feed-item")
+            item_count = await post_items.count()
+            if item_count == 0:
+                post_items = self.page.locator("[class*='post-feed']")
+                item_count = await post_items.count()
+            for i in range(min(item_count, page_size)):
+                try:
+                    item = post_items.nth(i)
+                    cover_el = item.locator("div.media img.thumb").first
+                    cover_url = await cover_el.get_attribute("src") or "" if await cover_el.count() > 0 else ""
+                    if not cover_url:
+                        cover_el = item.locator("img").first
+                        cover_url = await cover_el.get_attribute("src") or "" if await cover_el.count() > 0 else ""
+                    title_el = item.locator("div.post-title").first
+                    title = (await title_el.text_content() or "").strip() if await title_el.count() > 0 else ""
+                    time_el = item.locator("div.post-time span").first
+                    publish_time = (await time_el.text_content() or "").strip() if await time_el.count() > 0 else ""
+                    play_count = like_count = comment_count = share_count = collect_count = 0
+                    data_items = item.locator("div.post-data div.data-item")
+                    for j in range(await data_items.count()):
+                        data_item = data_items.nth(j)
+                        count_text = (await data_item.locator("span.count").text_content() or "0").strip()
+                        if await data_item.locator("span.weui-icon-outlined-eyes-on").count() > 0:
+                            play_count = self._parse_count(count_text)
+                        elif await data_item.locator("span.weui-icon-outlined-like").count() > 0:
+                            like_count = self._parse_count(count_text)
+                        elif await data_item.locator("span.weui-icon-outlined-comment").count() > 0:
+                            comment_count = self._parse_count(count_text)
+                        elif await data_item.locator("use[xlink\\:href='#icon-share']").count() > 0:
+                            share_count = self._parse_count(count_text)
+                        elif await data_item.locator("use[xlink\\:href='#icon-thumb']").count() > 0:
+                            collect_count = self._parse_count(count_text)
+                    work_id = f"weixin_{i}_{hash(title)}_{hash(publish_time)}"
+                    works.append(WorkItem(
+                        work_id=work_id,
+                        title=title or "无标题",
+                        cover_url=cover_url,
+                        duration=0,
+                        status="published",
+                        publish_time=publish_time,
+                        play_count=play_count,
+                        like_count=like_count,
+                        comment_count=comment_count,
+                        share_count=share_count,
+                        collect_count=collect_count,
+                    ))
+                except Exception as e:
+                    print(f"[{self.platform_name}] DOM 解析作品 {i} 失败: {e}", flush=True)
+                    continue
+            total = len(works)
+            has_more = item_count > page_size
+            print(f"[{self.platform_name}] DOM 回退获取 {len(works)} 条", flush=True)
+        except Exception as e:
+            print(f"[{self.platform_name}] DOM 回退失败: {e}", flush=True)
+        return (works, total, has_more, "")
+    
     async def get_works(self, cookies: str, page: int = 0, page_size: int = 20) -> WorksResult:
-
-
-        print(f"1111111111111111111")
-        """获取视频号作品列表"""
+        """获取视频号作品列表(调用 post_list 接口)
+        page: 页码从 0 开始,或上一页返回的 rawKeyBuff/lastBuff 字符串
+        """
+        # 分页:首页 currentPage=1/rawKeyBuff=null,下一页用 currentPage 递增或 rawKeyBuff
+        if page is None or page == "" or (isinstance(page, int) and page == 0):
+            current_page = 1
+            raw_key_buff = None
+        elif isinstance(page, int):
+            current_page = page + 1
+            raw_key_buff = None
+        else:
+            current_page = 1
+            raw_key_buff = str(page)
+        ts_ms = str(int(time.time() * 1000))
         print(f"\n{'='*60}")
-        print(f"[{self.platform_name}] 获取作品列表")
-        print(f"[{self.platform_name}] page={page}, page_size={page_size}")
+        print(f"[{self.platform_name}] 获取作品列表 currentPage={current_page}, pageSize={page_size}, rawKeyBuff={raw_key_buff[:40] if raw_key_buff else 'null'}...")
         print(f"{'='*60}")
         
         works: List[WorkItem] = []
         total = 0
         has_more = False
+        next_page = ""
         
         try:
             await self.init_browser()
@@ -991,131 +1070,136 @@ class WeixinPublisher(BasePublisher):
             if not self.page:
                 raise Exception("Page not initialized")
             
-            # 访问视频号创作者中心
-            await self.page.goto("https://channels.weixin.qq.com/platform/post/list") 
-            await asyncio.sleep(5)
-            print(f"1111111111111111")
-            # 检查登录状态
+            await self.page.goto("https://channels.weixin.qq.com/micro/content/post/list", timeout=30000)
+            await asyncio.sleep(3)
+            
             current_url = self.page.url
             if "login" in current_url:
-                print(f"2111111111111111")
-                raise Exception("Cookie 已过期,请重新登录") 
+                raise Exception("Cookie 已过期,请重新登录")
             
-            # 视频号使用页面爬取方式获取作品列表
-            # 等待作品列表加载(增加等待时间,并添加截图调试)
-            try:
-                await self.page.wait_for_selector('div.post-feed-item', timeout=15000)
-            except:
-                # 超时后打印当前 URL 和截图
-                current_url = self.page.url
-                print(f"[{self.platform_name}] 等待超时,当前 URL: {current_url}")
-                # 截图保存
-                screenshot_path = f"weixin_timeout_{int(asyncio.get_event_loop().time())}.png"
-                await self.page.screenshot(path=screenshot_path)
-                print(f"[{self.platform_name}] 截图已保存: {screenshot_path}")
-                raise Exception(f"页面加载超时,当前 URL: {current_url}")
+            api_url = "https://channels.weixin.qq.com/micro/content/cgi-bin/mmfinderassistant-bin/post/post_list"
+            req_body = {
+                "pageSize": page_size,
+                "currentPage": current_page,
+                "userpageType": 11,
+                "stickyOrder": True,
+                "timestamp": ts_ms,
+                "_log_finder_uin": "",
+                "_log_finder_id": "",
+                "rawKeyBuff": raw_key_buff,
+                "pluginSessionId": None,
+                "scene": 7,
+                "reqScene": 7,
+            }
+            body_str = json.dumps(req_body)
             
-            # 打印 DOM 结构
-            page_html = await self.page.content()
-            print(f"[{self.platform_name}] ========== 页面 DOM 开始 ==========")
-            print(page_html[:5000])  # 打印前5000个字符
-            print(f"[{self.platform_name}] ========== 页面 DOM 结束 ==========")
+            response = await self.page.evaluate("""
+                async ([url, bodyStr]) => {
+                    try {
+                        const resp = await fetch(url, {
+                            method: 'POST',
+                            credentials: 'include',
+                            headers: {
+                                'Content-Type': 'application/json',
+                                'Accept': '*/*',
+                                'Referer': 'https://channels.weixin.qq.com/micro/content/post/list'
+                            },
+                            body: bodyStr
+                        });
+                        return await resp.json();
+                    } catch (e) {
+                        return { error: e.toString() };
+                    }
+                }
+            """, [api_url, body_str])
             
-            # 获取所有作品项
-            post_items = self.page.locator('div.post-feed-item')
-            item_count = await post_items.count()
+            is_first_page = current_page == 1 and raw_key_buff is None
+            if response.get("error"):
+                print(f"[{self.platform_name}] API 请求失败: {response.get('error')}", flush=True)
+                if is_first_page:
+                    works, total, has_more, next_page = await self._get_works_fallback_dom(page_size)
+                    if works:
+                        return WorksResult(success=True, platform=self.platform_name, works=works, total=total, has_more=has_more, next_page=next_page)
+                return WorksResult(success=False, platform=self.platform_name, error=response.get("error", "API 请求失败"))
             
-            print(f"[{self.platform_name}] 找到 {item_count} 个作品项")
+            err_code = response.get("errCode", -1)
+            if err_code != 0:
+                err_msg = response.get("errMsg", "unknown")
+                print(f"[{self.platform_name}] API errCode={err_code}, errMsg={err_msg}, 完整响应(前800字): {json.dumps(response, ensure_ascii=False)[:800]}", flush=True)
+                if is_first_page:
+                    works, total, has_more, next_page = await self._get_works_fallback_dom(page_size)
+                    if works:
+                        return WorksResult(success=True, platform=self.platform_name, works=works, total=total, has_more=has_more, next_page=next_page)
+                return WorksResult(success=False, platform=self.platform_name, error=f"errCode={err_code}, errMsg={err_msg}")
             
-            for i in range(min(item_count, page_size)):
+            data = response.get("data") or {}
+            raw_list = data.get("list") or []
+            total = int(data.get("totalCount") or 0)
+            has_more = bool(data.get("continueFlag", False))
+            next_page = (data.get("lastBuff") or "").strip()
+            
+            print(f"[{self.platform_name}] API 响应: list_len={len(raw_list)}, totalCount={total}, continueFlag={has_more}, lastBuff={next_page[:50] if next_page else ''}...")
+            
+            if is_first_page and len(raw_list) == 0:
+                works_fb, total_fb, has_more_fb, _ = await self._get_works_fallback_dom(page_size)
+                if works_fb:
+                    return WorksResult(success=True, platform=self.platform_name, works=works_fb, total=total_fb, has_more=has_more_fb, next_page="")
+            
+            for item in raw_list:
                 try:
-                    item = post_items.nth(i)
-                    
-                    # 获取封面
-                    cover_el = item.locator('div.media img.thumb').first
-                    cover_url = ''
-                    if await cover_el.count() > 0:
-                        cover_url = await cover_el.get_attribute('src') or ''
+                    work_id = str(item.get("objectId") or item.get("id") or "").strip()
+                    if not work_id:
+                        work_id = f"weixin_{hash(item.get('createTime',0))}_{hash(item.get('desc', {}).get('description',''))}"
                     
-                    # 获取标题
-                    title_el = item.locator('div.post-title').first
-                    title = ''
-                    if await title_el.count() > 0:
-                        title = await title_el.text_content() or ''
-                        title = title.strip()
+                    desc = item.get("desc") or {}
+                    title = (desc.get("description") or "").strip() or "无标题"
+                    cover_url = ""
+                    duration = 0
+                    media_list = desc.get("media") or []
+                    if media_list and isinstance(media_list[0], dict):
+                        m = media_list[0]
+                        cover_url = (m.get("coverUrl") or m.get("thumbUrl") or "").strip()
+                        duration = int(m.get("videoPlayLen") or 0)
                     
-                    # 获取发布时间
-                    time_el = item.locator('div.post-time span').first
-                    publish_time = ''
-                    if await time_el.count() > 0:
-                        publish_time = await time_el.text_content() or ''
-                        publish_time = publish_time.strip()
-                    
-                    # 获取统计数据
-                    import re
-                    data_items = item.locator('div.post-data div.data-item')
-                    data_count = await data_items.count()
-                    
-                    play_count = 0
-                    like_count = 0
-                    comment_count = 0
-                    share_count = 0
-                    collect_count = 0
-                    
-                    for j in range(data_count):
-                        data_item = data_items.nth(j)
-                        count_text = await data_item.locator('span.count').text_content() or '0'
-                        count_text = count_text.strip()
-                        
-                        # 判断图标类型
-                        if await data_item.locator('span.weui-icon-outlined-eyes-on').count() > 0:
-                            # 播放量
-                            play_count = self._parse_count(count_text)
-                        elif await data_item.locator('span.weui-icon-outlined-like').count() > 0:
-                            # 点赞
-                            like_count = self._parse_count(count_text)
-                        elif await data_item.locator('span.weui-icon-outlined-comment').count() > 0:
-                            # 评论
-                            comment_count = self._parse_count(count_text)
-                        elif await data_item.locator('use[xlink\\:href="#icon-share"]').count() > 0:
-                            # 分享
-                            share_count = self._parse_count(count_text)
-                        elif await data_item.locator('use[xlink\\:href="#icon-thumb"]').count() > 0:
-                            # 收藏
-                            collect_count = self._parse_count(count_text)
+                    create_ts = item.get("createTime") or 0
+                    if isinstance(create_ts, (int, float)) and create_ts:
+                        publish_time = datetime.fromtimestamp(create_ts).strftime("%Y-%m-%d %H:%M:%S")
+                    else:
+                        publish_time = str(create_ts) if create_ts else ""
                     
-                    # 生成临时 work_id
-                    work_id = f"weixin_{i}_{hash(title)}_{hash(publish_time)}"
+                    read_count = int(item.get("readCount") or 0)
+                    like_count = int(item.get("likeCount") or 0)
+                    comment_count = int(item.get("commentCount") or 0)
+                    forward_count = int(item.get("forwardCount") or 0)
+                    fav_count = int(item.get("favCount") or 0)
                     
                     works.append(WorkItem(
                         work_id=work_id,
-                        title=title or '无标题',
+                        title=title,
                         cover_url=cover_url,
-                        duration=0,
-                        status='published',
+                        duration=duration,
+                        status="published",
                         publish_time=publish_time,
-                        play_count=play_count,
+                        play_count=read_count,
                         like_count=like_count,
                         comment_count=comment_count,
-                        share_count=share_count,
-                        collect_count=collect_count,
+                        share_count=forward_count,
+                        collect_count=fav_count,
                     ))
                 except Exception as e:
-                    print(f"[{self.platform_name}] 解析作品 {i} 失败: {e}")
-                    import traceback
-                    traceback.print_exc()
+                    print(f"[{self.platform_name}] 解析作品项失败: {e}", flush=True)
                     continue
             
-            total = len(works)
-            has_more = item_count > page_size
-            print(f"[{self.platform_name}] 获取到 {total} 个作品")
+            if total == 0 and works:
+                total = len(works)
+            print(f"[{self.platform_name}] 本页获取 {len(works)} 条,totalCount={total}, next_page={bool(next_page)}")
             
         except Exception as e:
             import traceback
             traceback.print_exc()
             return WorksResult(success=False, platform=self.platform_name, error=str(e))
         
-        return WorksResult(success=True, platform=self.platform_name, works=works, total=total, has_more=has_more)
+        return WorksResult(success=True, platform=self.platform_name, works=works, total=total, has_more=has_more, next_page=next_page)
     
     async def get_comments(self, cookies: str, work_id: str, cursor: str = "") -> CommentsResult:
         """获取视频号作品评论"""

+ 2 - 1
server/src/models/entities/Work.ts

@@ -17,7 +17,8 @@ export class Work {
   @Column({ type: 'varchar', length: 20 })
   platform!: string;
 
-  @Column({ name: 'platform_video_id', type: 'varchar', length: 100 })
+  /** 平台作品 ID,视频号 objectId 较长,需 500 字符 */
+  @Column({ name: 'platform_video_id', type: 'varchar', length: 500 })
   platformVideoId!: string;
 
   @Column({ type: 'varchar', length: 200, default: '' })

+ 38 - 8
server/src/services/HeadlessBrowserService.ts

@@ -725,7 +725,7 @@ class HeadlessBrowserService {
 
     let cursor: string | number = 0;
     const seenCursors = new Set<string>();
-    // 抖音和小红书使用 cursor 分页(next_page 作为下一页的 max_cursor),其他平台用 pageIndex
+    // 抖音、小红书使用 cursor 分页;视频号使用 currentPage 页码(pageIndex 0,1,2...)
     const useCursorPagination = platform === 'xiaohongshu' || platform === 'douyin';
     for (let pageIndex = 0; pageIndex < maxPages; pageIndex++) {
       const pageParam = useCursorPagination ? cursor : pageIndex;
@@ -835,7 +835,7 @@ class HeadlessBrowserService {
           cursor = (typeof cursor === 'number' ? cursor + 1 : pageIndex + 1);
         }
 
-        // 抖音:仅当无下一页游标或本页 0 条时停止(不依赖 has_more/declaredTotal,避免只同步 20 条)
+        // 抖音:仅当无下一页游标或本页 0 条时停止
         if (platform === 'douyin') {
           if (!hasNextCursor || pageWorks.length === 0) break;
         } else {
@@ -877,21 +877,51 @@ class HeadlessBrowserService {
     // 百家号:优先走 Python 的 /account_info(包含粉丝数、作品数),避免 Node 直连分散认证问题
     if (platform === 'baijiahao') {
       pythonAvailable = await this.checkPythonServiceAvailable();
+
+      let info: AccountInfo;
       if (pythonAvailable) {
         logger.info(`[Python API] Service available, fetching account_info for baijiahao`);
         try {
-          return await this.fetchAccountInfoViaPython(platform, cookies);
+          info = await this.fetchAccountInfoViaPython(platform, cookies);
+          info.source = 'python';
+          info.pythonAvailable = true;
         } catch (error) {
-          logger.warn(`[Python API] Failed to fetch account_info for baijiahao, falling back to direct API:`, error);
+          logger.warn(`[Python API] Failed to fetch account_info for baijiahao, will still try /works:`, error);
+          info = this.getDefaultAccountInfo(platform);
+          info.source = 'python';
+          info.pythonAvailable = true;
         }
       } else {
         logger.info(`[Python API] Service not available for baijiahao, falling back to direct API`);
+        // Python 不可用时,回退到 Node 直连 API(可能仍会遇到分散认证问题)
+        info = await this.fetchBaijiahaoAccountInfoDirectApi(cookies);
+        info.source = 'api';
+        info.pythonAvailable = false;
+      }
+
+      // 百家号同步作品需要全量:优先通过 Python /works 自动分页拉取
+      if (pythonAvailable) {
+        try {
+          const { works: worksList, total: worksTotal } = await this.fetchWorksViaPython(
+            platform,
+            cookies,
+            options?.onWorksFetchProgress
+          );
+          info.worksList = worksList;
+          if (worksTotal && worksTotal > 0) {
+            info.worksCount = worksTotal;
+            info.worksListComplete = worksList.length >= worksTotal;
+          } else if (worksList.length > 0) {
+            info.worksCount = Math.max(info.worksCount || 0, worksList.length);
+            info.worksListComplete = undefined;
+          }
+          info.source = 'python';
+          info.pythonAvailable = true;
+        } catch (error) {
+          logger.warn(`[Python API] Failed to fetch works for baijiahao:`, error);
+        }
       }
 
-      // Python 不可用或失败时,回退到 Node 直连 API(可能仍会遇到分散认证问题)
-      const info = await this.fetchBaijiahaoAccountInfoDirectApi(cookies);
-      info.source = 'api';
-      info.pythonAvailable = pythonAvailable;
       return info;
     }