python爬虫之Scrapy 使用代理配置

PHP中文网 • 2025年2月27日 21:20:34 • 编程技术 • 阅读 2

在爬取网站内容的时候，最常遇到的问题是：网站对ip有限制，会有防抓取功能，最好的办法就是ip轮换抓取（加代理）

下面来说一下Scrapy如何配置代理，进行抓取

1.在Scrapy工程下新建“middlewares.py”

# Importing base64 library because we'll need it ONLY in case if the proxy we are going to use requires authenticationimport base64 # Start your middleware classclass ProxyMiddleware(object):    # overwrite process request    def process_request(self, request, spider):        # Set the location of the proxy        request.meta['proxy'] = "http://YOUR_PROXY_IP:PORT"          # Use the following lines if your proxy requires authentication        proxy_user_pass = "USERNAME:PASSWORD"        # setup basic authentication for the proxy        encoded_user_pass = base64.encodestring(proxy_user_pass)        request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass

登录后复制

立即学习“Python免费学习笔记（深入）”；

2.在项目配置文件里(./pythontab/settings.py)添加

DOWNLOADER_MIDDLEWARES = {    'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110,    'pythontab.middlewares.ProxyMiddleware': 100,}

登录后复制

立即学习“Python免费学习笔记（深入）”；

发布者：PHP中文网，转转请注明出处：https://www.chuangxiangniao.com/p/2285304.html

python爬虫之Scrapy 使用代理配置

关于作者

PHP中文网签约作者

发表回复

python爬虫之Scrapy 使用代理配置

关于作者

PHP中文网签约作者

AD推荐 黄金广告位招租... 更多推荐

发表回复

分享到:

请登录

AD推荐黄金广告位招租... 更多推荐