当前位置：首页 > 云计算 > 正文内容

pdf如何用python读取？

2022-05-04 03:27:36云计算1

python中可以使用pdfminer库来读取PDF文件中的内容。

安装命令：

pipinstallpdfminer

pipinstallpdfminer3k

python中读取PDF文件代码：

fromurllib.requestimporturlopen
frompdfminer.pdfinterpimportPDFResourceManager,process_pdf
frompdfminer.converterimportTextConverter
frompdfminer.layoutimportLAParams
fromioimportStringIO
fromioimportopen

defreadPDF(pdfFile):
rsrcmgr=PDFResourceManager()
retstr=StringIO()
laparams=LAParams()
device=TextConverter(rsrcmgr,retstr,laparams=laparams)

process_pdf(rsrcmgr,device,pdfFile)
device.close()

content=retstr.getvalue()
retstr.close()
returncontent

pdfFile=urlopen("http://pythonscraping.com/pages/warandpeace/chapter1.pdf")
outputString=readPDF(pdfFile)
print(outputString)
pdfFile.close()

解析pdf文件用到的类：

PDFParser：从一个文件中获取数据
PDFDocument：保存获取的数据，和PDFParser是相互关联的
PDFPageInterpreter处理页面内容
PDFDevice将其翻译成你需要的格式
PDFResourceManager用于存储共享资源，如字体或图像。

更多Python知识请关注Python自学网

本网站文章仅供交流学习 ,不作为商用，版权归属原作者，部分文章推送时未能及时与原作者取得联系，若来源标注错误或侵犯到您的权益烦请告知，我们将立即删除.

本文链接：https://www.xibujisuan.cn/16958.html

标签: Python

返回列表

上一篇：python操作微信客户端：WechatPCAPI库实现自动化回复

下一篇：print为什么在3时变成了函数？

pdf如何用python读取？

©2022 西部计算 见证东数西算进程与服务器行业发展 站点地图

©2022 西部计算见证东数西算进程与服务器行业发展
站点地图