""" AI 审核模块 - 自动审核内容 提供敏感词检测、内容质量评估等功能 """ import re from typing import Dict, List, Tuple class AIAuditService: """AI 审核服务类""" # 敏感词库(示例,实际应该从数据库或配置文件加载) SENSITIVE_WORDS = [ '暴力', '恐怖', '色情', '赌博', '毒品', '诈骗', '传销', '假币', '枪支', '弹药', ] # 广告关键词 AD_KEYWORDS = [ '加微信', 'QQ 群', '联系电话', '手机号', 'www.', '.com', '.cn', 'http', ] # 最小内容长度 MIN_CONTENT_LENGTH = 10 @classmethod def check_sensitive_words(cls, text: str) -> Tuple[bool, List[str]]: """ 检查敏感词 Args: text: 待检查文本 Returns: (是否包含敏感词,敏感词列表) """ found_words = [] for word in cls.SENSITIVE_WORDS: if word in text: found_words.append(word) return len(found_words) > 0, found_words @classmethod def check_advertisement(cls, text: str) -> Tuple[bool, List[str]]: """ 检查广告内容 Args: text: 待检查文本 Returns: (是否包含广告,广告关键词列表) """ found_keywords = [] for keyword in cls.AD_KEYWORDS: if keyword in text: found_keywords.append(keyword) return len(found_keywords) > 0, found_keywords @classmethod def check_content_quality(cls, text: str) -> Dict: """ 检查内容质量 Args: text: 待检查文本 Returns: 质量评估结果 """ result = { 'is_valid': True, 'issues': [], 'score': 100, } # 检查长度 if len(text) < cls.MIN_CONTENT_LENGTH: result['is_valid'] = False result['issues'].append(f'内容太短,最少需要{cls.MIN_CONTENT_LENGTH}个字符') result['score'] -= 50 # 检查重复字符(刷屏检测) if len(set(text)) < len(text) * 0.3: result['is_valid'] = False result['issues'].append('内容包含大量重复字符') result['score'] -= 30 # 检查全角字符比例 chinese_chars = len(re.findall(r'[\u4e00-\u9fa5]', text)) if chinese_chars / max(len(text), 1) < 0.1: result['issues'].append('中文内容比例较低') result['score'] -= 10 return result @classmethod def audit_article(cls, title: str, content: str) -> Dict: """ 审核文章 Args: title: 文章标题 content: 文章内容 Returns: 审核结果 """ result = { 'approved': True, 'reason': '', 'details': {}, } # 检查标题 sensitive, words = cls.check_sensitive_words(title) if sensitive: result['approved'] = False result['reason'] = f'标题包含敏感词:{", ".join(words)}' result['details']['sensitive_words'] = words return result # 检查内容 sensitive, words = cls.check_sensitive_words(content) if sensitive: result['approved'] = False result['reason'] = f'内容包含敏感词:{", ".join(words)}' result['details']['sensitive_words'] = words return result # 检查广告 is_ad, keywords = cls.check_advertisement(content) if is_ad: result['approved'] = False result['reason'] = f'内容疑似广告:{", ".join(keywords)}' result['details']['ad_keywords'] = keywords return result # 检查内容质量 quality = cls.check_content_quality(content) if not quality['is_valid']: result['approved'] = False result['reason'] = f'内容质量不达标:{", ".join(quality["issues"])}' result['details']['quality'] = quality return result result['reason'] = '审核通过' result['details']['quality_score'] = quality['score'] return result @classmethod def audit_comment(cls, content: str) -> Dict: """ 审核评论 Args: content: 评论内容 Returns: 审核结果 """ result = { 'approved': True, 'reason': '', 'details': {}, } # 检查敏感词 sensitive, words = cls.check_sensitive_words(content) if sensitive: result['approved'] = False result['reason'] = f'包含敏感词:{", ".join(words)}' result['details']['sensitive_words'] = words return result # 检查广告 is_ad, keywords = cls.check_advertisement(content) if is_ad: result['approved'] = False result['reason'] = f'疑似广告:{", ".join(keywords)}' result['details']['ad_keywords'] = keywords return result # 检查内容质量 quality = cls.check_content_quality(content) if not quality['is_valid']: result['approved'] = False result['reason'] = f'内容质量不达标:{", ".join(quality["issues"])}' result['details']['quality'] = quality return result result['reason'] = '审核通过' return result @classmethod def audit_service(cls, name: str, description: str) -> Dict: """ 审核特色服务 Args: name: 服务名称 description: 服务描述 Returns: 审核结果 """ # 合并名称和描述进行检查 full_text = f"{name} {description}" result = { 'approved': True, 'reason': '', 'details': {}, } # 检查敏感词 sensitive, words = cls.check_sensitive_words(full_text) if sensitive: result['approved'] = False result['reason'] = f'包含敏感词:{", ".join(words)}' result['details']['sensitive_words'] = words return result # 检查广告(服务本身可以包含联系方式,这里放宽检查) # 只检查明显的垃圾广告 spam_keywords = ['加微信', 'QQ 群', '点击链接'] found_spam = [kw for kw in spam_keywords if kw in full_text] if found_spam: result['approved'] = False result['reason'] = f'包含垃圾广告内容:{", ".join(found_spam)}' result['details']['spam_keywords'] = found_spam return result # 检查内容质量 quality = cls.check_content_quality(description) if not quality['is_valid']: result['approved'] = False result['reason'] = f'描述质量不达标:{", ".join(quality["issues"])}' result['details']['quality'] = quality return result result['reason'] = '审核通过' return result # 单例实例 ai_audit_service = AIAuditService()