This is an explanation of the video content.
 用技术延续对ACG的热爱
84

 |   | 

20行代码实现一个简易的爬虫脚本

简单抓取

main.py

import requests
response = requests.get("https://www.bing.com/")

抓取并解析HTML结构

main.py

import requests
from bs4 import BeautifulSoup

# initialize the list of discovered urls
# with the first page to visit
urls = ["https://www.xbiqugew.com"]

# until all pages have been visited
while len(urls) != 0:
	# get the page to visit from the list
	current_url = urls.pop()

	# crawling logic
	response = requests.get(current_url)
	soup = BeautifulSoup(response.content, "html.parser")

	link_elements = soup.select("a[href]")
	for link_element in link_elements:
		url = link_element['href']
		if "https://www.xbiqugew.com" in url:
			urls.append(url)
	print(urls)

84 🛠️系统设计与开发 ↦ Python爬虫系统-爬取小说网站 __ 87 字
 Python爬虫系统-爬取小说网站 #4