Skip to content

风控系统重点关注的 Headers

Header风控关注点风险等级规避策略
User-Agent是否为已知爬虫/自动化工具标识、是否与浏览器指纹匹配🔴 高维护真实UA池随机轮换,与TLS/Client Hints保持一致
Cookie会话状态、登录态、行为追踪标识(如 ga、_cf_bm)🔴 高Session保持会话,定期刷新,模拟登录流程
Referer请求来源是否合理、是否存在跨站异常🟡 中模拟正常浏览路径,Referer与请求URL逻辑一致
Sec-Fetch-*请求上下文(mode/site/dest),判断是否为脚本发起🟡 中理解含义后正确伪造,避免逻辑矛盾
Sec-CH-UA-*Client Hints,现代浏览器指纹识别🟡 中与UA版本保持一致
X-Forwarded-For代理IP检测、真实IP追溯🟡 中使用高匿代理,避免暴露真实IP
Accept-Language语言偏好是否与IP地域一致🟢 低根据代理IP地域设置对应语言
Accept-Encoding是否支持gzip等压缩🟢 低始终包含 gzip, deflate, br

详细规避策略

1. User-Agent 伪装

python
import random

# 维护真实浏览器UA池
UA_POOL = [
    # Chrome Windows
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
    # Chrome Mac
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
    # Firefox Windows
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:123.0) Gecko/20100101 Firefox/123.0",
    # Safari Mac
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15",
]

headers = {"User-Agent": random.choice(UA_POOL)}

2. Sec-Fetch-* 系列(现代风控重点)

这组Header由浏览器自动设置,用于标识请求的上下文:

python
# 模拟正常页面导航
headers = {
    "Sec-Fetch-Dest": "document",      # 目标类型:document/image/script/style
    "Sec-Fetch-Mode": "navigate",      # 请求模式:navigate/cors/no-cors/same-origin
    "Sec-Fetch-Site": "same-origin",   # 来源关系:same-origin/same-site/cross-site/none
    "Sec-Fetch-User": "?1"             # 是否由用户触发:?1 表示是
}

# 模拟 AJAX 请求
headers_ajax = {
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "same-origin",
}
Sec-Fetch-Site 值含义
none直接输入URL或书签
same-origin同源请求
same-site同站跨源
cross-site跨站请求

3. Client Hints (Sec-CH-UA-*)

python
# 需与 User-Agent 版本匹配
headers = {
    "Sec-CH-UA": '"Chromium";v="122", "Not(A:Brand";v="24", "Google Chrome";v="122"',
    "Sec-CH-UA-Mobile": "?0",
    "Sec-CH-UA-Platform": '"Windows"',
    "Sec-CH-UA-Platform-Version": '"15.0.0"',
    "Sec-CH-UA-Full-Version-List": '"Chromium";v="122.0.6261.112", "Google Chrome";v="122.0.6261.112"',
}
python
import requests

# 使用 Session 保持会话
session = requests.Session()

# 先访问首页获取初始Cookie
session.get("https://example.com")

# 后续请求自动携带Cookie
response = session.get("https://example.com/api/data")
  • 常见风控Cookie
    • __cf_bm / cf_clearance — Cloudflare
    • _ga / _gid — Google Analytics 追踪
    • JSESSIONID — Java会话
    • __Secure-* / __Host-* — 安全Cookie前缀

5. Referer 构造

python
# 模拟从搜索引擎进入
headers = {
    "Referer": "https://www.google.com/"
}

# 模拟站内跳转
headers = {
    "Referer": "https://example.com/products"
}

完整 Headers 模板

python
def get_chrome_headers(referer=None):
    """生成模拟 Chrome 122 的完整 Headers"""
    headers = {
        # 基础
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",
        "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
        "Connection": "keep-alive",
        
        # Sec-Fetch 系列
        "Sec-Fetch-Dest": "document",
        "Sec-Fetch-Mode": "navigate",
        "Sec-Fetch-Site": "none" if not referer else "same-origin",
        "Sec-Fetch-User": "?1",
        
        # Client Hints
        "Sec-CH-UA": '"Chromium";v="122", "Not(A:Brand";v="24", "Google Chrome";v="122"',
        "Sec-CH-UA-Mobile": "?0",
        "Sec-CH-UA-Platform": '"Windows"',
        
        # 其他
        "Upgrade-Insecure-Requests": "1",
        "Cache-Control": "max-age=0",
    }
    
    if referer:
        headers["Referer"] = referer
    
    return headers

Header 之外的检测维度

TLS/JA3 指纹

  • TLS握手时的参数组合(加密套件、扩展顺序等)形成唯一指纹
  • requests 库的JA3指纹与真实浏览器明显不同
  • 解决方案:使用 curl_cffitls-clienthttpx + 自定义SSL上下文
python
# curl_cffi 可模拟真实浏览器TLS指纹
from curl_cffi import requests

response = requests.get(
    "https://example.com",
    impersonate="chrome110"  # 模拟Chrome 110的TLS指纹
)

HTTP/2 指纹 (Akamai Fingerprint)

  • SETTINGS帧参数、WINDOW_UPDATE值、优先级树结构
  • 不同浏览器有独特的HTTP/2行为模式

行为分析

检测点正常用户爬虫特征
请求间隔随机、不规律固定间隔或过快
鼠标轨迹曲线移动无或直线跳跃
页面停留数秒到数分钟毫秒级或无
资源加载完整加载CSS/JS/图片仅请求目标页面

JavaScript 指纹

  • Canvas 指纹
  • WebGL 渲染器信息
  • AudioContext 指纹
  • navigator 对象属性

解决方案:使用 Playwright/Puppeteer + Stealth 插件


工具推荐

工具适用场景特点
requests简单站点轻量,但TLS指纹明显
httpx需要HTTP/2支持异步,原生HTTP/2
curl_cffi需绕过TLS检测可模拟真实浏览器指纹
Playwright重度反爬站点真实浏览器,配合stealth
undetected-chromedriverSelenium场景自动绕过常见检测

实战检查清单

  • [ ] UA与Client Hints版本一致
  • [ ] Sec-Fetch-* 逻辑合理
  • [ ] Cookie正确获取和维护
  • [ ] Referer符合浏览路径
  • [ ] 请求间隔随机化(2-5秒)
  • [ ] 考虑TLS指纹问题
  • [ ] 必要时使用代理IP池
  • [ ] 遵守 robots.txt 和网站ToS
评论
  • 按正序
  • 按倒序
  • 按热度
Powered by Waline v3.7.1