Try to leave where you have been , No body was there to tell you it'll be a long way.

Step into Scrapy

Posted on 2020-12-26 => 19:24:56

| Words count in article: 215

配置

pip install scrapy
scrapy 查看用法
scrapy startproject getTiobe 新建 scrapy 项目
cd getTiobe\spiders 进入该目录
scrapy genspider tiobe www.tiobe.com/tiobe-index 生成该链接的爬虫文件

获取
获取该元素对应的 css 选择器
修改 parse 方法

def parse(self, response):
for item in response.css('#top20 > tbody > tr'):
    yield {
        'rank_this-year': item.css('td:nth-child(1)::text').get().strip(),
        'rank_last-year': item.css('td:nth-child(2)::text').get().strip(),
        'programming_language': item.css('td:nth-child(4)::text').get().strip(),
        'ratings': item.css('td:nth-child(5)::text').get().strip(),
        'change': item.css('td:nth-child(6)::text').get().strip(),
        'date': time.strftime('%Y/%m/%d %H:%M:%S', time.localtime(time.time()))
    }

实施

在 spiders 目录下的 settings.py 设置导出 json 格式
- 1
  2
  FEED_FORMAT = "json"
  FEED_URI = "tiobe.json"
scrapy crawl tiobe
就可以在项目的目录下找到 tiobe.json 获得的数据了

参考

Post author: Orekiyuta
Post link: http://canoe.orekiyuta.cn/archives/stepIntoScrapy/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 3.0 unless stating additionally.

< - あの日途切れてしまった言葉を繋ぎ留めたいだけ - >