爬取图片

TanHaX

2023-05-11 / 2 评论 / 415 阅读 / 正在检测是否收录...

05/11

爬取星巴克图片但是后面无法下载！

from bs4 import BeautifulSoup
import urllib.request
import os

url = 'https://www.starbucks.com.cn/menu/'

try:
    response = urllib.request.urlopen(url)
except:
    print("Error: 无法检索到 URL")
    exit()

content = response.read().decode('utf-8')
soup = BeautifulSoup(content, 'lxml')

image_list = [(a.find('div', class_='preview')['style'].split('url("')[1].split('")')[0],
               a.find('strong').text) for a in soup.find_all('a', class_='thumbnail')]

folder_path = os.path.join(os.getcwd(), "starbucks")

if not os.path.isdir(folder_path):
    os.makedirs(folder_path)

for url, name in image_list:
    try:
        urllib.request.urlretrieve(url="https://www.starbucks.com.cn" + url,
                                   filename=os.path.join(folder_path, name + '.jpg'))
    except:
        print(f"Error: 无法检索到 {name}")

打赏

Python

版权属于： TanHaX

本文链接： https://nihaotang.com/archives/36.html

作品采用：《署名-非商业性使用-相同方式共享 4.0 国际 (CC BY-NC-SA 4.0) 》许可协议授权

重复文件清理工具

Python 扫描局域网存活设备

TanHaX-SPCWC

使用 pyi-set_version 为 Python EXE 文件添加版本信息的方法

小林运维呀秀才
Windows 10 · Google Chrome
沙发

Warning: Trying to access array offset on value of type null in /www/wwwroot/blog/usr/themes/Joe/core/function.php on line 261

@

看了你博客的各个文章收获颇深，身为同行不知道有没有机会交个朋友，在运维的道路上相互学习，逢山开路，遇水架桥。

2023-05-17 回复
1. Smilinghan 作者
  Windows 10 · Google Chrome
  
  @ 小林运维呀
  
  非常感谢您对我的博客的阅读和肯定，同时我也是一名在校生刚踏入运维这方面技术浅薄，我深知技术的重要性和不断学习的必要性。我相信在逢山开路，遇水架桥的过程中，我们可以一起学习、探讨，共同成长。祝您在运维这个行业中取得更多的进步和成就！
  
  2023-05-17 回复