安装:
sudo pip install scrapyd
配置:
#文件~/.scrapyd.conf#内容如下:[scrapyd]eggs_dir = /home/sirius/scrapyd/eggslogs_dir = /home/sirius/scrapyd/logsitems_dir = /home/sirius/scrapyd/itemsjobs_to_keep = 5dbs_dir = /home/sirius/scrapyd/dbsmax_proc = 0max_proc_per_cpu = 4finished_to_keep = 50poll_interval = 5bind_address = 0.0.0.0http_port = 6800debug = offrunner = scrapyd.runnerapplication = scrapyd.app.applicationlauncher = scrapyd.launcher.Launcherwebroot = scrapyd.website.Root[services]schedule.json = scrapyd.webservice.Schedulecancel.json = scrapyd.webservice.Canceladdversion.json = scrapyd.webservice.AddVersionlistprojects.json = scrapyd.webservice.ListProjectslistversions.json = scrapyd.webservice.ListVersionslistspiders.json = scrapyd.webservice.ListSpidersdelproject.json = scrapyd.webservice.DeleteProjectdelversion.json = scrapyd.webservice.DeleteVersionlistjobs.json = scrapyd.webservice.ListJobs#daemonstatus.json = scrapyd.webservice.DaemonStatus
守护进程,用这个的原因实在是因为scrapyd太脆弱了,一看不住就挂了
安装:
sudo pip install supervisor
配置:
sudo mkdir -p /etc/supervisor/#导入默认配置sudo su - root -c "echo_supervisord_conf > /etc/supervisor/supervisord.conf"#链接管理[inet_http_server] ; inet (TCP) server disabled by defaultport=127.0.0.1:9001 ; (ip_address:port specifier, *:port for all iface);username=user ; (default is no username (open server));password=123 ; (default is no password (open server)) [supervisorctl];serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL for a unix socketserverurl=http://127.0.0.1:9001 ; use an http:// url to specify an inet socket;username=chris ; should be same as http_username if set;password=123 ; should be same as http_password if set;prompt=mysupervisor ; cmd line prompt (default "supervisor");history_file=~/.sc_history ; use readline history if available#设置管理进程[program:scrapyd]command=scrapydautostart=trueautorestart=unexpected
`创建文件/usr/lib/systemd/system/supervisord.service内容如下:[Unit] Description=supervisord - Supervisor process control system for UNIXDocumentation=http://supervisord.org After=network.target [Service] Type=forking ExecStart=/usr/bin/supervisord -c /etc/supervisor/supervisord.conf ExecReload=/usr/bin/supervisorctl reload ExecStop=/usr/bin/supervisorctl shutdown User=[Install] WantedBy=multi-user.target#启动sudo systemctl enable supervisordsudo systemctl start supervisord#查看supervisorctl#如一切正常|>$ scrapyd RUNNING pid 8059, uptime 0:02:02
#常用命令status #查看状态reload #重新载入restart scrapyd #重启任务update #可以更新 supervisor 配置tail -f scrapyd stderr #检查日志
爬虫部署:
:
cd <项目目录> scrapyd-deploy 项目目录>
:
curl http://localhost:6800/schedule.json -d project=myproject -d spider=somespider