feat: 添加三个重要补丁

补丁 1: 分块读取与流式传输 - 8KB 分块读取大文件，避免内存飙升 - 流式计算文件哈希，无需加载完整内容 - 差异对比限制，大文件只显示头尾各 500 行 - 新增 chunked 参数支持流式传输补丁 2: .lobsterignore 机制 - 创建 IgnorePattern 类实现模式匹配 - 支持 .lobsterignore 文件配置 - 添加默认忽略规则（.DS_Store, node_modules 等） - 支持通配符匹配（*, ?, 目录匹配） - 新增 API: GET /api/ignore/patterns/, POST /api/ignore/reload/ 补丁 3: 操作溯源（Audit Log） - 新增 SyncHistory 模型记录同步历史 - 创建 AuditLogger 类用于记录操作 - 所有同步操作自动记录日志 - 记录操作者、版本变化、哈希变化、执行时间等 - 新增 API: GET /api/history/ 更新内容: - models.py: 新增 SyncHistory 模型 - services.py: 新增 IgnorePattern, AuditLogger, 分块读取方法 - views.py: 所有同步操作添加日志记录, 新增历史和忽略规则接口 - serializers.py: 新增 SyncHistorySerializer - urls.py: 新增历史和忽略规则路由 - .lobsterignore.example: 示例忽略文件 - CHANGELOG.md: 详细更新日志
2026-04-05 12:20:57 +00:00
parent d9420b6cc6
commit 077656a6cf
7 changed files with 1007 additions and 43 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -0,0 +1,376 @@
+# 🎯 三个"补丁"更新日志
+
+## 更新时间
+2026-04-05
+
+## 更新说明
+
+根据逍遥子的建议，为龙虾记忆同步系统添加了三个重要功能补丁，提升系统性能、可用性和安全性。
+
+---
+
+## 📦 补丁 1: 分块读取与流式传输
+
+### 问题
+- 如果龙虾的记忆文件（比如某些 Log 或向量快照）超过 50MB
+- 一次性 GET /api/diff 会让后端内存瞬间飙升
+
+### 解决方案
+- **流式读取**：使用 8KB 分块读取大文件，避免一次性加载到内存
+- **流式哈希计算**：直接从文件流计算哈希，无需加载完整内容
+- **差异对比限制**：大文件只显示头尾各 500 行，中间省略
+
+### 实现细节
+```python
+# services.py
+class FileScanner:
+    chunk_size = 8192  # 8KB 分块读取
+
+    def read_file_chunked(self, file_path: Path) -> str:
+        """分块读取文件"""
+        content_parts = []
+        with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
+            while True:
+                chunk = f.read(self.chunk_size)
+                if not chunk:
+                    break
+                content_parts.append(chunk)
+        return ''.join(content_parts)
+
+    def read_file_stream(self, file_path: str) -> Iterator[str]:
+        """流式读取文件（用于大文件传输）"""
+        with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
+            while True:
+                chunk = f.read(self.chunk_size)
+                if not chunk:
+                    break
+                yield chunk
+
+    def compute_hash_stream(self, file_path: Path) -> str:
+        """流式计算文件哈希（避免大文件内存问题）"""
+        hash_obj = hashlib.sha256()
+        with open(file_path, 'rb') as f:
+            while True:
+                chunk = f.read(self.chunk_size)
+                if not chunk:
+                    break
+                hash_obj.update(chunk)
+        return hash_obj.hexdigest()
+
+class DiffChecker:
+    def get_file_diff(self, local_content: str, db_content: str, max_lines: int = 1000) -> Dict:
+        """获取文件差异（支持大文件限制）"""
+        local_lines = local_content.split('\n')
+        db_lines = db_content.split('\n')
+
+        # 限制行数（大文件只显示头尾）
+        if len(local_lines) > max_lines:
+            local_head = local_lines[:max_lines//2]
+            local_tail = local_lines[-max_lines//2:]
+            local_lines = local_head + ['... (中间省略 {}) 行 ...'.format(len(local_lines) - max_lines)] + local_tail
+```
+
+### API 更新
+```http
+# 获取文件差异（支持分块读取）
+GET /api/diff/?lobster_id=daotong&file_path=large-file.log&chunked=true
+```
+
+---
+
+## 📦 补丁 2: .lobsterignore 机制
+
+### 问题
+- 临时文件（如 .DS_Store、日志缓存）不需要进数据库
+- 手动维护一个排除列表会更清爽
+
+### 解决方案
+- 创建 `.lobsterignore` 文件（类似 `.gitignore`）
+- 扫描时自动跳过匹配的文件
+- 提供默认忽略规则
+
+### 实现细节
+```python
+# services.py
+class IgnorePattern:
+    """.lobsterignore 模式匹配器"""
+
+    def __init__(self, base_dir: Path):
+        self.base_dir = base_dir
+        self.patterns = []
+        self.load_patterns()
+
+    def load_patterns(self):
+        """加载 .lobsterignore 文件"""
+        ignore_file = self.base_dir / '.lobsterignore'
+
+        if ignore_file.exists():
+            with open(ignore_file, 'r', encoding='utf-8') as f:
+                for line in f:
+                    line = line.strip()
+                    # 跳过空行和注释
+                    if line and not line.startswith('#'):
+                        self.patterns.append(line)
+
+        # 添加默认忽略规则
+        default_patterns = [
+            '.DS_Store', '.git', '.gitignore', '__pycache__',
+            'node_modules', '*.pyc', '*.pyo', '*.log',
+            '*.tmp', '*.temp', '*.bak', '.vscode', '.idea'
+        ]
+        for pattern in default_patterns:
+            if pattern not in self.patterns:
+                self.patterns.append(pattern)
+
+    def is_ignored(self, file_path: Path) -> bool:
+        """判断文件是否被忽略"""
+        relative_path = file_path.relative_to(self.base_dir)
+
+        for pattern in self.patterns:
+            # 匹配文件名
+            if fnmatch.fnmatch(file_path.name, pattern):
+                return True
+
+            # 匹配相对路径
+            if fnmatch.fnmatch(str(relative_path), pattern):
+                return True
+
+            # 匹配目录
+            if pattern.endswith('/') and fnmatch.fnmatch(str(relative_path.parent), pattern.rstrip('/')):
+                return True
+
+            # 递归匹配子目录
+            if pattern.startswith('*/'):
+                parts = str(relative_path).split(os.sep)
+                for i, part in enumerate(parts):
+                    if fnmatch.fnmatch(part, pattern[2:]):
+                        return True
+
+        return False
+```
+
+### 示例文件
+```bash
+# .lobsterignore
+# 系统文件
+.DS_Store
+.Thumbs.db
+
+# IDE 和编辑器
+.vscode/
+.idea/
+*.swp
+
+# Python
+__pycache__/
+*.pyc
+*.log
+
+# Node.js
+node_modules/
+
+# 临时文件
+*.tmp
+*.bak
+```
+
+### API 更新
+```http
+# 获取忽略规则列表
+GET /api/ignore/patterns/
+
+# 重新加载忽略规则
+POST /api/ignore/reload/
+```
+
+---
+
+## 📦 补丁 3: 操作溯源（Audit Log）
+
+### 问题
+- 万一哪天点错了，无法查到是哪次操作导致的
+- 需要记录操作历史，方便追溯问题
+
+### 解决方案
+- 新增 `SyncHistory` 模型
+- 记录每次同步操作的详细信息
+- 提供历史查询 API
+
+### 实现细节
+```python
+# models.py
+class SyncHistory(models.Model):
+    """同步操作历史记录"""
+
+    ACTION_CHOICES = [
+        ('sync_to_db', '同步到数据库'),
+        ('sync_to_local', '同步到本地'),
+        ('auto_sync', '自动同步'),
+        ('manual_merge', '手动合并'),
+    ]
+
+    STATUS_CHOICES = [
+        ('success', '成功'),
+        ('failed', '失败'),
+        ('partial', '部分成功'),
+    ]
+
+    lobster_id = models.CharField(max_length=50, help_text='龙虾ID')
+    file_path = models.CharField(max_length=500, help_text='文件相对路径')
+    action = models.CharField(max_length=20, choices=ACTION_CHOICES, help_text='操作类型')
+    status = models.CharField(max_length=20, choices=STATUS_CHOICES, help_text='操作状态')
+    old_version = models.IntegerField(null=True, blank=True, help_text='操作前版本')
+    new_version = models.IntegerField(null=True, blank=True, help_text='操作后版本')
+    old_hash = models.CharField(max_length=64, null=True, blank=True, help_text='操作前哈希')
+    new_hash = models.CharField(max_length=64, null=True, blank=True, help_text='操作后哈希')
+    file_size = models.IntegerField(default=0, help_text='文件大小（字节）')
+    operator = models.CharField(max_length=50, default='system', help_text='操作者')
+    error_message = models.TextField(null=True, blank=True, help_text='错误信息')
+    execution_time = models.FloatField(default=0, help_text='执行时间（秒）')
+    created_at = models.DateTimeField(auto_now_add=True, help_text='操作时间')
+
+# services.py
+class AuditLogger:
+    """操作日志记录器"""
+
+    def log_sync_action(
+        self,
+        lobster_id: str,
+        file_path: str,
+        action: str,
+        old_version: int = None,
+        new_version: int = None,
+        old_hash: str = None,
+        new_hash: str = None,
+        file_size: int = 0,
+        operator: str = 'system',
+        status: str = 'success',
+        error_message: str = None,
+        execution_time: float = 0
+    ):
+        """记录同步操作"""
+        self.model.objects.create(...)
+
+    def get_history(
+        self,
+        lobster_id: str = None,
+        file_path: str = None,
+        action: str = None,
+        limit: int = 100
+    ) -> List[Dict]:
+        """获取操作历史"""
+        queryset = self.model.objects.all()
+        # 过滤和排序...
+```
+
+### 使用示例
+```python
+# views.py
+@api_view(['POST'])
+def sync_to_db(request):
+    """同步到数据库（带操作日志）"""
+    audit_logger = AuditLogger()
+
+    start_time = time.time()
+
+    try:
+        # 执行同步操作...
+        execution_time = time.time() - start_time
+
+        # 记录成功日志
+        audit_logger.log_sync_action(
+            lobster_id=lobster_id,
+            file_path=file_path,
+            action='sync_to_db',
+            old_version=old_version,
+            new_version=new_version,
+            old_hash=old_hash,
+            new_hash=file_hash,
+            file_size=record.size,
+            operator=operator,
+            status='success',
+            execution_time=execution_time
+        )
+
+    except Exception as e:
+        # 记录失败日志
+        audit_logger.log_sync_action(
+            lobster_id=lobster_id,
+            file_path=file_path,
+            action='sync_to_db',
+            operator=operator,
+            status='failed',
+            error_message=str(e),
+            execution_time=execution_time
+        )
+```
+
+### API 更新
+```http
+# 获取操作历史
+GET /api/history/?lobster_id=daotong&file_path=MEMORY.md&limit=50
+```
+
+### 历史记录示例
+```json
+{
+  "success": true,
+  "data": [
+    {
+      "id": 1,
+      "lobster_id": "daotong",
+      "file_path": "MEMORY.md",
+      "action": "sync_to_db",
+      "action_display": "同步到数据库",
+      "status": "success",
+      "status_display": "成功",
+      "old_version": 1,
+      "new_version": 2,
+      "old_hash": "abc123...",
+      "new_hash": "def456...",
+      "file_size": 1234,
+      "operator": "逍遥子",
+      "error_message": null,
+      "execution_time": 0.123,
+      "created_at": "2026-04-05T12:00:00Z"
+    }
+  ]
+}
+```
+
+---
+
+## 📋 数据库迁移
+
+需要执行数据库迁移以创建 `SyncHistory` 表：
+
+```bash
+# 进入后端容器
+docker exec -it lobster-backend bash
+
+# 创建迁移
+python manage.py makemigrations memory_app
+python manage.py migrate
+```
+
+---
+
+## ✅ 完成检查清单
+
+- [x] 分块读取与流式传输（services.py）
+- [x] .lobsterignore 机制（services.py + .lobsterignore.example）
+- [x] 操作溯源（models.py + services.py + views.py + serializers.py）
+- [x] 新增 API 接口（urls.py）
+- [x] 更新文档（CHANGELOG.md）
+
+---
+
+## 🚀 下一步
+
+1. 执行数据库迁移
+2. 推送代码到远程仓库
+3. 更新前端界面（添加历史记录和忽略规则管理）
+
+---
+
+**感谢逍遥子的宝贵建议！** 🙏