www.jbmf.net > BEAutiFulsoup用法

BEAutiFulsoup用法

file = open(filename)soup=Beautifulsoup(file,'lxml' )aElement = soup.select('div#autInfo div.info_c a')text1 = aElement.stringtext2 = soup.select('div#autDescription div.info_c').string我是按你图片的内容取的,select里面遵循CSS选择器语法,具体的CSS选择器语法你可以百度一下.

创建一个字符串,例子如下:Pythonhtml = """<html><head><title>The Dormouse's story</title></head><body><p class="title" name="dromouse"><b>The Dormouse's story</b></p><p class="story">Once upon a time there were three

这个是BeautifulSoup4才有的功能(Select选择CSS标签),在 beautifulsoup3之前没有这个方法.它是寻找包含有 link1 的标签的兄弟标签. link1前面的#,表示查找对应的#id~ 表示所有其他兄弟标签;+ 表示第一个其他兄弟标签.再看看别人怎么说的.

方法1.==》不用find ,直接 print soup.meta['content']方法2.==》print meta['content']ps:注意页面有多个meta 标签的情况

from bs4 import BeautifulSoupsoup = BeautifulSoup("www.baidu.com","lxml")print(str(soup.prettify()))

beautiful soup美丽的汤soup n. 汤,羹; 浓雾; 困境; vt. 加大马力; She has a knack of landing herself right in the soup.她老是让自己的处境很尴尬.[其他] 第三人称单数:soups 复数:soups 现在分词:souping 过去式:souped过去分词:

先安装,使用命令行 pip install bs4;到官网查看beautifulsoup API;程序中使用bs:import bs4 as beautifulsoup

f = urllib2.urlopen(url) req = f.read() soup = BeautifulSoup(req) content = soup.findAll(attrs={"name":"readonlycounter2"}) subId = content[0].string.split(',')[1] subName = soup.html.body.h1.span.string content = soup.findAll(attrs={"class":"

因为你的html不是合法的xml格式,标签没有成对出现,只能用html解析器 from bs4 import BeautifulSoup s = """</span><span style= 'font-size:12.0pt;color:#CC3399'>714659079qqcom 2014/09/10 10:14</span></p></div>""" soup =

源代码frombs4importBeautifulSouphtml_doc='''111(222)编辑'''soup=BeautifulSoup(html_doc,"html.parser")#初级版didi=soup.b.next_element.strip()invest=soup.b.span.next_element.strip()#进阶版didi,invest=soup.b.stripped_strings

网站地图

All rights reserved Powered by www.jbmf.net

copyright ©right 2010-2021。
www.jbmf.net内容来自网络,如有侵犯请联系客服。zhit325@qq.com