aws的上传、删除s3文件以及图像识别文字功能
准备工作
安装aws cli
根据自己的操作系统,下载相应的安装包安装。安装过程很简单,在此不再赘述。
在安装完成之后,运行以下两个命令来验证AWS CLI是否安装成功。参考以下示例,在MacOS上打开Terminal程序。如果是Windows系统,打开cmd。
- where aws / which aws 查看AWS CLI安装路径
- aws --version 查看AWS CLI版本
初始化配置AWS CLI
在使用AWS CLI前,可使用aws configure命令,完成初始化配置。

- 点击Create New Access Key以创建一对Access Key ID 及Secret Access Key,并保存(且仅能在创建时保存)

- Default region name,用以指定要连接的AWS 区域代码。每个AWS区域对应的代码可通过 此链接查找。
- Default output format,用以指定命令行输出内容的格式,默认使用JSON作为所有输出的格式。也可以使用以下任一格式:
JSON(JavaScript Object Notation)
YAML: 仅在 AWS CLI v2 版本中可用
Text
Table
更多详细的配置请看该文章
s3存储桶开通
该电脑配置的认证用户在aws的s3上有权限访问一个s3的存储桶,这个一般都是管理员给你开通
图像识别文字功能开通
该电脑配置的认证用户在aws的Amazon Textract的权限,这个一般都是管理员给你开通
aws的sdk
安装上述boto3的模块,一般会同时安装botocore模块
上传文件
方法一
使用upload_file方法来上传文件
方法二
使用PutObject来上传文件
删除文件
图像识别文字
识别发票、账单这种key,value的形式
单纯的识别文字
import boto3
import io
from io import BytesIO
import sys
import math
from PIL import Image, ImageDraw, ImageFont
def ShowBoundingBox(draw,box,width,height,boxColor):
left = width * box['Left']
top = height * box['Top']
draw.rectangle([left,top, left + (width * box['Width']), top +(height * box['Height'])],outline=boxColor)
def ShowSelectedElement(draw,box,width,height,boxColor):
left = width * box['Left']
top = height * box['Top']
draw.rectangle([left,top, left + (width * box['Width']), top +(height * box['Height'])],fill=boxColor)
def DisplayBlockInformation(block):
print('Id: {}'.format(block['Id']))
if 'Text' in block:
print(' Detected: ' + block['Text'])
print(' Type: ' + block['BlockType'])
if 'Confidence' in block:
print(' Confidence: ' + "{:.2f}".format(block['Confidence']) + "%")
if block['BlockType'] == 'CELL':
print(" Cell information")
print(" Column:" + str(block['ColumnIndex']))
print(" Row:" + str(block['RowIndex']))
print(" Column Span:" + str(block['ColumnSpan']))
print(" RowSpan:" + str(block['ColumnSpan']))
if 'Relationships' in block:
print(' Relationships: {}'.format(block['Relationships']))
print(' Geometry: ')
print(' Bounding Box: {}'.format(block['Geometry']['BoundingBox']))
print(' Polygon: {}'.format(block['Geometry']['Polygon']))
if block['BlockType'] == "KEY_VALUE_SET":
print (' Entity Type: ' + block['EntityTypes'][0])
if block['BlockType'] == 'SELECTION_ELEMENT':
print(' Selection element detected: ', end='')
if block['SelectionStatus'] =='SELECTED':
print('Selected')
else:
print('Not selected')
if 'Page' in block:
print('Page: ' + block['Page'])
print()
def process_text_analysis(bucket, document):
s3_connection = boto3.resource('s3')
s3_object = s3_connection.Object(bucket,document)
s3_response = s3_object.get()
stream = io.BytesIO(s3_response['Body'].read())
image=Image.open(stream)
client = boto3.client('textract')
image_binary = stream.getvalue()
response = client.analyze_document(Document={'Bytes': image_binary},
FeatureTypes=["TABLES", "FORMS"])
blocks=response['Blocks']
width, height =image.size
draw = ImageDraw.Draw(image)
print ('Detected Document Text')
for block in blocks:
DisplayBlockInformation(block)
draw=ImageDraw.Draw(image)
if block['BlockType'] == "KEY_VALUE_SET":
if block['EntityTypes'][0] == "KEY":
ShowBoundingBox(draw, block['Geometry']['BoundingBox'],width,height,'red')
else:
ShowBoundingBox(draw, block['Geometry']['BoundingBox'],width,height,'green')
if block['BlockType'] == 'TABLE':
ShowBoundingBox(draw, block['Geometry']['BoundingBox'],width,height, 'blue')
if block['BlockType'] == 'CELL':
ShowBoundingBox(draw, block['Geometry']['BoundingBox'],width,height, 'yellow')
if block['BlockType'] == 'SELECTION_ELEMENT':
if block['SelectionStatus'] =='SELECTED':
ShowSelectedElement(draw, block['Geometry']['BoundingBox'],width,height, 'blue')
image.show()
return len(blocks)
def main():
bucket = ''
document = ''
block_count=process_text_analysis(bucket,document)
print("Blocks detected: " + str(block_count))
if __name__ == "__main__":
main()
标签:
留言评论