Python API

AWS

class RPA.Cloud.AWS.AWS(region: str = 'eu-west-1', robocorp_vault_name: str | None = None)

AWS is a library for operating with Amazon AWS services S3, SQS, Textract and Comprehend.

Services are initialized with keywords like Init S3 Client for S3.

AWS authentication

Authentication for AWS is set with key id and access key which can be given to the library in three different ways.

  • Method 1 as environment variables, AWS_KEY_ID and AWS_KEY.

  • Method 2 as keyword parameters to Init Textract Client for example.

  • Method 3 as Robocorp vault secret. The vault name needs to be given in library init or with keyword Set Robocorp Vault. Secret keys are expected to match environment variable names.

Note. Starting from rpaframework-aws 1.0.3 region can be given as environment variable AWS_REGION or include as Robocorp Vault secret with the same key name.

Redshift Data authentication: Depending on the authorization method, use one of the following combinations of request parameters, which can only be passed via method 2:

  • Secrets Manager - when connecting to a cluster, specify the Amazon Resource Name (ARN) of the secret, the database name, and the cluster identifier that matches the cluster in the secret. When connecting to a serverless endpoint, specify the Amazon Resource Name (ARN) of the secret and the database name.

  • Temporary credentials - when connecting to a cluster, specify the cluster identifier, the database name, and the database user name. Also, permission to call the redshift:GetClusterCredentials operation is required. When connecting to a serverless endpoint, specify the database name.

Role Assumption: With the use of the STS service client, you are able to assume another role, which will return temporary credentials. The temporary credentials will include an access key and session token, see keyword documentation for Assume Role for details of how the credentials are returned. You can use these temporary credentials as part of method 2, but you must also include the session token.

Method 1. credentials using environment variable

*** Settings ***
Library   RPA.Cloud.AWS

*** Tasks ***
Init AWS services
    # NO parameters for client, expecting to get credentials
    # with AWS_KEY, AWS_KEY_ID and AWS_REGION environment variables
    Init S3 Client

Method 2. credentials with keyword parameter

*** Settings ***
Library   RPA.Cloud.AWS   region=us-east-1

*** Tasks ***
Init AWS services
    Init S3 Client  aws_key_id=${AWS_KEY_ID}  aws_key=${AWS_KEY}

Method 3. setting Robocorp Vault in the library init

*** Settings ***
Library   RPA.Cloud.AWS  robocorp_vault_name=aws

*** Tasks ***
Init AWS services
    Init S3 Client  use_robocorp_vault=${TRUE}

Method 3. setting Robocorp Vault with keyword

*** Settings ***
Library   RPA.Cloud.AWS

*** Tasks ***
Init AWS services
    Set Robocorp Vault     vault_name=aws
    Init Textract Client    use_robocorp_vault=${TRUE}

Requirements

The default installation depends on boto3 library. Due to the size of the dependency, this library is available separate package rpaframework-aws but can also be installed as an optional package for rpaframework.

Recommended installation is rpaframework-aws plus rpaframework package. Remember to check latest versions from rpaframework Github repository.

channels:
  - conda-forge
dependencies:
  - python=3.7.5
  - pip=20.1
  - pip:
    - rpaframework==13.0.2
    - rpaframework-aws==1.0.3

Example

*** Settings ***
Library   RPA.Cloud.AWS   region=us-east-1

*** Variables ***
${BUCKET_NAME}        testbucket12213123123

*** Tasks ***
Upload a file into S3 bucket
    [Setup]   Init S3 Client
    Upload File      ${BUCKET_NAME}   ${/}path${/}to${/}file.pdf
    @{files}         List Files   ${BUCKET_NAME}
    FOR   ${file}  IN   @{files}
        Log  ${file}
    END
ROBOT_LIBRARY_DOC_FORMAT = 'REST'
ROBOT_LIBRARY_SCOPE = 'GLOBAL'
analyze_document(image_file: str | None = None, json_file: str | None = None, bucket_name: str | None = None, model: bool = False) bool

Analyzes an input document for relationships between detected items

Parameters:
  • image_file – filepath (or object name) of image file

  • json_file – filepath to resulting json file

  • bucket_name – if given then using image_file from the bucket

  • model – set True to return Textract Document model, default False

Returns:

analysis response in json or TextractDocument model

Example:

${response}    Analyze Document    ${filename}    model=True
FOR    ${page}    IN    @{response.pages}
    Log Many    ${page.tables}
    Log Many    ${page.form}
    Log Lines    ${page.lines}
    Log Many    ${page}
    Log    ${page}
    Log    ${page.form}
END
assume_role(role_arn: str, role_session_name: str, policy_arns: List[Dict] | None = None, policy: str | None = None, duration: int = 900, tags: List[Dict] | None = None, transitive_tag_keys: List[str] | None = None, external_id: str | None = None, serial_number: str | None = None, token_code: str | None = None, source_identity: str | None = None) Dict

Returns a set of temporary security credentials that you can use to access Amazon Web Services resources that you might not normally have access to. These temporary credentials consist of an access key ID, a secret access key, and a security token. Typically, you use Assume Role within your account or for cross-account access.

The credentials are returned as a dictionary with data structure similar to the following JSON:

{
    "Credentials": {
        "AccessKeyId": "string",
        "SecretAccessKey": "string",
        "SessionToken": "string",
        "Expiration": "2015-01-01"
    },
    "AssumedRoleUser": {
        "AssumedRoleId": "string",
        "Arn": "string"
    },
    "PackedPolicySize": 123,
    "SourceIdentity": "string"
}

These credentials can be used to re-initialize services available in this library with the assumed role instead of the original role.

NOTE: For detailed information on the available arguments to this keyword, please see the Boto3 STS documentation.

Parameters:
  • role_arn – The Amazon Resource Name (ARN) of the role to assume.

  • role_session_name – An identifier for the assumed role session.

  • policy_arns – The Amazon Resource Names (ARNs) of the IAM managed policies that you want to use as managed session policies. The policies must exist in the same account as the role.

  • policy – An IAM policy in JSON format that you want to use as an inline session policy.

  • duration – The duration, in seconds, of the role session. The value specified can range from 900 seconds (15 minutes and the default) up to the maximum session duration set for the role.

  • tags – A list of session tags that you want to pass. Each session tag consists of a key name and an associated value.

  • transitive_tag_keys – A list of keys for session tags that you want to set as transitive. If you set a tag key as transitive, the corresponding key and value passes to subsequent sessions in a role chain.

  • external_id – A unique identifier that might be required when you assume a role in another account. If the administrator of the account to which the role belongs provided you with an external ID, then provide that value in this parameter.

  • serial_number – The identification number of the MFA device that is associated with the user who is making the using the assume_role keyword.

  • token_code – The value provided by the MFA device, if the trust policy of the role being assumed requires MFA.

  • source_identity – The source identity specified by the principal that is using the assume_role keyword.

clients: dict = {}
convert_textract_response_to_model(response)

Convert AWS Textract JSON response into TextractDocument object, which has following structure:

  • Document

  • Page

  • Tables

  • Rows

  • Cells

  • Lines

  • Words

  • Form

  • Field

Parameters:

response – JSON response from AWS Textract service

Returns:

TextractDocument object

Example:

${response}    Analyze Document    ${filename}
${model}=    Convert Textract Response To Model    ${response}
FOR    ${page}    IN    @{model.pages}
    Log Many    ${page.tables}
    Log Many    ${page.form}
    Log Lines    ${page.lines}
    Log Many    ${page}
    Log    ${page}
    Log    ${page.form}
END
create_bucket(bucket_name: str | None = None, **kwargs) bool

Create S3 bucket with name

note This keyword accepts additional parameters in key=value format

More info on additional parameters.

Parameters:

bucket_name – name for the bucket

Returns:

boolean indicating status of operation

Robot Framework example:

Create Bucket  public-bucket   ACL=public-read-write
create_queue(queue_name: str | None = None)

Create queue with name

Parameters:

queue_name – [description], defaults to None

Returns:

create queue response as dict

create_redshift_statement_parameters(**params) List[Dict[str, str]]

Returns a formatted dictionary to be used in Redshift Data Api SQL statements.

Example:

Assume the ${SQL} statement has the parameters :id and :name:

*** Tasks ***

${params}=    Create sql parameters    id=123    name=Nokia
# params produces a data structure like so:
#   [
#        {"name":"id", "value":"123"},
#        {"name":"name", "value":"Nokia"}
#    ]

# Which can be used for the 'parameters' argument.
${response}=    Execute redshift statement    ${SQL}    ${params}
delete_bucket(bucket_name: str | None = None) bool

Delete S3 bucket with name

Parameters:

bucket_name – name for the bucket

Returns:

boolean indicating status of operation

delete_files(bucket_name: str | None = None, files: list | None = None, **kwargs)

Delete files in the bucket

note This keyword accepts additional parameters in key=value format

More info on additional parameters.

Parameters:
  • bucket_name – name for the bucket

  • files – list of files to delete

Returns:

number of files deleted or False

delete_message(receipt_handle: str | None = None)

Delete message in the queue

Parameters:

receipt_handle – message handle to delete

Returns:

delete message response as dict

delete_queue(queue_name: str | None = None)

Delete queue with name

Parameters:

queue_name – [description], defaults to None

Returns:

delete queue response as dict

describe_redshift_table(database: str, schema: str | None = None, table: str | None = None) Dict | List[Dict]

Describes the detailed information about a table from metadata in the cluster. The information includes its columns.

If schema and/or table is not provided, the API searches all schemas for the provided table, or returns all tables in the schema or entire database.

The response object is provided as a list of table meta data objects, utilize dot-notation or the RPA.JSON library to access members:

{
    "ColumnList": [
        {
            "columnDefault": "string",
            "isCaseSensitive": true,
            "isCurrency": false,
            "isSigned": false,
            "label": "string",
            "length": 123,
            "name": "string",
            "nullable": 123,
            "precision": 123,
            "scale": 123,
            "schemaName": "string",
            "tableName": "string",
            "typeName": "string"
        },
    ],
    "TableName": "string"
}
Parameters:
  • database – The name of the database that contains the tables to be described. If ommitted, will use the connected Database.

  • schema – The schema that contains the table. If no schema is specified, then matching tables for all schemas are returned.

  • table – The table name. If no table is specified, then all tables for all matching schemas are returned. If no table and no schema is specified, then all tables for all schemas in the database are returned

detect_document_text(image_file: str | None = None, json_file: str | None = None, bucket_name: str | None = None) bool

Detects text in the input document.

Parameters:
  • image_file – filepath (or object name) of image file

  • json_file – filepath to resulting json file

  • bucket_name – if given then using image_file from the bucket

Returns:

analysis response in json

detect_entities(text: str | None = None, lang='en') dict

Inspects text for named entities, and returns information about them

Parameters:
  • text – A UTF-8 text string. Each string must contain fewer that 5,000 bytes of UTF-8 encoded characters

  • lang – language code of the text, defaults to “en”

detect_sentiment(text: str | None = None, lang='en') dict

Inspects text and returns an inference of the prevailing sentiment

Parameters:
  • text – A UTF-8 text string. Each string must contain fewer that 5,000 bytes of UTF-8 encoded characters

  • lang – language code of the text, defaults to “en”

download_files(bucket_name: str | None = None, files: list | None = None, target_directory: str | None = None, **kwargs) list

Download files from bucket to local filesystem

note This keyword accepts additional parameters in key=value format.

More info on additional parameters.

Parameters:
  • bucket_name – name for the bucket

  • files – list of S3 object names

  • target_directory – location for the downloaded files, default current directory

Returns:

number of files downloaded

execute_redshift_statement(sql: str, parameters: list | None = None, statement_name: str | None = None, with_event: bool = False, timeout: int = 40) Table | str

Runs an SQL statement, which can be data manipulation language (DML) or data definition language (DDL). This statement must be a single SQL statement.

SQL statements can be parameterized with named parameters through the use of the parameters argument. Parameters must be dictionaries with the following two keys:

  • name: The name of the parameter. In the SQL statement this will be referenced as :name.

  • value: The value of the parameter. Amazon Redshift implicitly converts to the proper data type. For more information, see Data types in the Amazon Redshift Database Developer Guide.

For simplicity, a helper keyword, `Create redshift statement parameters`, is available and can be used more naturally in Robot Framework contexts.

If tabular data is returned, this keyword tries to return it as a table (see RPA.Tables), if RPA.Tables is not available in the keyword’s scope, the data will be returned as a list of dictionaries. Other types of data (SQL errors and result statements) are returned as strings.

NOTE: You may modify the max built-in wait time by providing a timeout in seconds (default 40 seconds)

Robot framework example:

*** Tasks ***

    ${SQL}=    Set variable    insert into mytable values (:id, :address)
    ${params}=    Create redshift statement parameters
    ...    id=1
    ...    address=Seattle
    ${response}=    Execute redshift statement    ${SQL}    ${params}
    Log    ${response}

Python example:

sql = "insert into mytable values (:id, :address)"
parameters = [
    {"name": "id", "value": "1"},
    {"name": "address", "value": "Seattle"},
]
response = aws.execute_redshift_statement(sql, parameters)
print(response)
Parameters:
  • parameters – The parameters for the SQL statement. Must consist of a list of dictionaries with two keys: name and value.

  • sql – The SQL statement text to run.

  • statement_name – The name of the SQL statement. You can name the SQL statement when you create it to identify the query.

  • with_event – A value that indicates whether to send an event to the Amazon EventBridge event bus after the SQL statement runs.

  • timeout – Used to calculate the maximum wait. Exact timing depends on system variability becuase the underlying waiter does not utilize a timeout directly.

execute_redshift_statement_asyncronously(sql: str, parameters: list | None = None, statement_name: str | None = None, with_event: bool = False) str

Submit a sql statement for Redshift to execute asyncronously. Returns the statement ID which can be used to retrieve statement results later.

Parameters:
  • parameters – The parameters for the SQL statement. Must consist of a list of dictionaries with two keys: name and value.

  • sql – The SQL statement text to run.

  • statement_name – The name of the SQL statement. You can name the SQL statement when you create it to identify the query.

  • with_event – A value that indicates whether to send an event to the Amazon EventBridge event bus after the SQL statement runs.

generate_presigned_url(bucket_name: str, object_name: str, expires_in: int | None = None, **extra_params) tuple

Generate presigned URL for the file.

Parameters:
  • bucket_name – name for the bucket

  • object_name – name of the file in the bucket

  • expires_in – optional expiration time for the url (in seconds). The default expiration time is 3600 seconds (one hour).

  • extra_params – allows setting any extra Params

Returns:

URL for accessing the file

get_cells()

Get parsed cells from the response

Returns:

cells

get_document_analysis(job_id: str | None = None, max_results: int = 1000, next_token: str | None = None, collect_all_results: bool = False) dict

Get the results of Textract asynchronous Document Analysis operation

Parameters:
  • job_id – job identifier, defaults to None

  • max_results – number of blocks to get at a time, defaults to 1000

  • next_token – pagination token for getting next set of results, defaults to None

  • collect_all_results – when set to True will wait until analysis is complete and returns all blocks of the analysis result, by default (False) the all blocks need to be specifically collected using next_token variable

Returns:

dictionary

Response dictionary has key JobStatus with value SUCCEEDED when analysis has been completed.

Example:

Init Textract Client  %{AWS_KEY_ID}  %{AWS_KEY_SECRET}  %{AWS_REGION}
${jobid}=    Start Document Analysis  s3bucket_name  invoice.pdf
# Wait for job completion and collect all blocks
${response}=    Get Document Analysis  ${jobid}  collect_all_results=True
# Model will contain all pages of the invoice.pdf
${model}=    Convert Textract Response To Model    ${response}
get_document_text_detection(job_id: str | None = None, max_results: int = 1000, next_token: str | None = None, collect_all_results: bool = False) dict

Get the results of Textract asynchronous Document Text Detection operation

Parameters:
  • job_id – job identifier, defaults to None

  • max_results – number of blocks to get at a time, defaults to 1000

  • next_token – pagination token for getting next set of results, defaults to None

  • collect_all_results – when set to True will wait until analysis is complete and returns all blocks of the analysis result, by default (False) the all blocks need to be specifically collected using next_token variable

Returns:

dictionary

Response dictionary has key JobStatus with value SUCCEEDED when analysis has been completed.

Example:

Init Textract Client  %{AWS_KEY_ID}  %{AWS_KEY_SECRET}  %{AWS_REGION}
${jobid}=    Start Document Text Detection  s3bucket_name  invoice.pdf
# Wait for job completion and collect all blocks
${response}=   Get Document Text Detection    ${jobid}  collect_all_results=True
# Model will contain all pages of the invoice.pdf
${model}=    Convert Textract Response To Model    ${response}
get_pages_and_text(textract_response: dict) dict

Get pages and text out of Textract response json

Parameters:

textract_response – JSON from Textract

Returns:

dictionary, page numbers as keys and value is a list of text lines

get_redshift_statement_results(statement_id: str, timeout: int = 40) Table | int

Retrieve the results of a SQL statement previously submitted to Redshift. If that statement has not yet completed, this keyword will wait for results. See `Execute Redshift Statement` for additional information.

If the statement has tabular results, this keyword returns them as a table from RPA.Tables if that library is available, or as a list of dictionaries if not. If the statement does not have tabular results, it will return the number of rows affected.

Parameters:
  • statement_id – The statement id to use to retreive results.

  • timeout – An integer used to calculate the maximum wait. Exact timing depends on system variability becuase the underlying waiter does not utilize a timeout directly. Defaults to 40.

get_tables()

Get parsed tables from the response

Returns RPA.Tables.Table if possible otherwise returns an dictionary.

Returns:

tables

get_words()

Get parsed words from the response

Returns:

words

init_comprehend_client(aws_key_id: str | None = None, aws_key: str | None = None, region: str | None = None, use_robocorp_vault: bool = False, session_token: str | None = None)

Initialize AWS Comprehend client

Parameters:
  • aws_key_id – access key ID

  • aws_key – secret access key

  • region – AWS region

  • use_robocorp_vault – use secret stored in Robocorp Vault

  • session_token – a session token associated with temporary credentials, such as from Assume Role.

init_redshift_data_client(aws_key_id: str | None = None, aws_key: str | None = None, region: str | None = None, cluster_identifier: str | None = None, database: str | None = None, database_user: str | None = None, secret_arn: str | None = None, use_robocorp_vault: bool = False, session_token: str | None = None) None

Initialize AWS Redshift Data API client

Parameters:
  • aws_key_id – access key ID

  • aws_key – secret access key

  • region – AWS region

  • cluster_identifier – The cluster identifier. This parameter is required when connecting to a cluster and authenticating using either Secrets Manager or temporary credentials.

  • database – The name of the database. This parameter is required when authenticating using either Secrets Manager or temporary credentials.

  • database_user – The database user name. This parameter is required when connecting to a cluster and authenticating using temporary credentials.

  • secret_arn – The name or ARN of the secret that enables access to the database. This parameter is required when authenticating using Secrets Manager.

  • use_robocorp_vault – use secret stored in Robocorp Vault

  • session_token – a session token associated with temporary credentials, such as from Assume Role.

init_s3_client(aws_key_id: str | None = None, aws_key: str | None = None, region: str | None = None, use_robocorp_vault: bool = False, session_token: str | None = None) None

Initialize AWS S3 client

Parameters:
  • aws_key_id – access key ID

  • aws_key – secret access key

  • region – AWS region

  • use_robocorp_vault – use secret stored in Robocorp Vault

  • session_token – a session token associated with temporary credentials, such as from Assume Role.

init_sqs_client(aws_key_id: str | None = None, aws_key: str | None = None, region: str | None = None, queue_url: str | None = None, use_robocorp_vault: bool = False, session_token: str | None = None)

Initialize AWS SQS client

Parameters:
  • aws_key_id – access key ID

  • aws_key – secret access key

  • region – AWS region

  • queue_url – SQS queue url

  • use_robocorp_vault – use secret stored into Robocorp Vault

  • session_token – a session token associated with temporary credentials, such as from Assume Role.

init_sts_client(aws_key_id: str | None = None, aws_key: str | None = None, region: str | None = None, use_robocorp_vault: bool = False, session_token: str | None = None) None

Initialize AWS STS client.

Parameters:
  • aws_key_id – access key ID

  • aws_key – secret access key

  • region – AWS region

  • use_robocorp_vault – use secret stored in Robocorp Vault

  • session_token – a session token associated with temporary credentials, such as from Assume Role.

init_textract_client(aws_key_id: str | None = None, aws_key: str | None = None, region: str | None = None, use_robocorp_vault: bool = False, session_token: str | None = None)

Initialize AWS Textract client

Parameters:
  • aws_key_id – access key ID

  • aws_key – secret access key

  • region – AWS region

  • use_robocorp_vault – use secret stored in Robocorp Vault

  • session_token – a session token associated with temporary credentials, such as from Assume Role.

list_buckets() list

List all buckets for this account

Returns:

list of buckets

list_files(bucket_name: str, limit: int | None = None, search: str | None = None, prefix: str | None = None, **kwargs) list

List files in the bucket

note This keyword accepts additional parameters in key=value format

More info on additional parameters.

Parameters:
  • bucket_name – name for the bucket

  • limit – limits the response to maximum number of items

  • searchJMESPATH expression to filter objects

  • prefix – limits the response to keys that begin with the specified prefix

  • kwargs – allows setting all extra parameters for list_objects_v2 method

Returns:

list of files

Python examples

# List all files in a bucket
files = AWSlibrary.list_files("bucket_name")

# List files in a bucket matching `.yaml`
files = AWSlibrary.list_files(
    "bucket_name", search="Contents[?contains(Key, '.yaml')]"
)

# List files in a bucket matching `.png` and limit results to max 3
files = AWSlibrary.list_files(
    "bucket_name", limit=3, search="Contents[?contains(Key, '.png')]"
)

# List files in a bucket prefixed with `special` and get only 1
files = AWSlibrary.list_files(
    "bucket_name", prefix="special", limit=1
)

Robot Framework examples

# List all files in a bucket
@{files}=   List Files   bucket-name

# List files in a bucket matching `.yaml`
@{files}=   List Files
...    bucket-name
...    search=Contents[?contains(Key, '.yaml')]

# List files in a bucket matching `.png` and limit results to max 3
@{files}=  List Files
...   bucket-name
...   limit=3
...   search=Contents[?contains(Key, '.png')]

# List files in a bucket prefixed with `special` and get only 1
@{files}=   List Files
...   bucket-name
...   prefix=special
...   limit=1
)
list_redshift_databases() List[str]

List the databases in a cluster.

Database names are returned as a list of strings.

list_redshift_schemas(database: str | None = None, schema_pattern: str | None = None) List[Dict]

Lists the schemas in a database.

Schema names are returned as a list of strings.

Parameters:
  • database – The name of the database that contains the schemas to list. If ommitted, will use the connected Database.

  • schema_pattern – A pattern to filter results by schema name. Within a schema pattern, “%” means match any substring of 0 or more characters and “_” means match any one character. Only schema name entries matching the search pattern are returned. If schema_pattern is not specified, then all schemas are returned.

list_redshift_tables(database: str | None = None, schema_pattern: str | None = None, table_pattern: str | None = None) List[Dict]

List the tables in a database. If neither schema_pattern nor table_pattern are specified, then all tables in the database are returned.

Returned objects are structured like the below JSON in a list:

{
    "name": "string",
    "schema": "string",
    "type": "string"
}
Parameters:
  • database – The name of the database that contains the tables to be described. If ommitted, will use the connected Database.

  • schema_pattern – A pattern to filter results by schema name. Within a schema pattern, “%” means match any substring of 0 or more characters and “_” means match any one character. Only schema name entries matching the search pattern are returned. If schema_pattern is not specified, then all tables that match table_pattern are returned. If neither schema_pattern or table_pattern are specified, then all tables are returned.

  • table_pattern – A pattern to filter results by table name. Within a table pattern, “%” means match any substring of 0 or more characters and “_” means match any one character. Only table name entries matching the search pattern are returned. If table_pattern is not specified, then all tables that match schema_pattern are returned. If neither schema_pattern or table_pattern are specified, then all tables are returned.

logger = None
receive_message() dict

Receive message from queue

Returns:

message as dict

region: str | None = None
robocorp_vault_name: str | None = None
send_message(message: str | None = None, message_attributes: dict | None = None) dict

Send message to the queue

Parameters:
  • message – body of the message

  • message_attributes – attributes of the message

Returns:

send message response as dict

services: list = []
set_robocorp_vault(vault_name)

Set Robocorp Vault name

Parameters:

vault_name – Robocorp Vault name

start_document_analysis(bucket_name_in: str | None = None, object_name_in: str | None = None, object_version_in: str | None = None, bucket_name_out: str | None = None, prefix_object_out: str = 'textract_output')

Starts the asynchronous analysis of an input document for relationships between detected items such as key-value pairs, tables, and selection elements.

Parameters:
  • bucket_name_in – name of the S3 bucket for the input object, defaults to None

  • object_name_in – name of the input object, defaults to None

  • object_version_in – version of the input object, defaults to None

  • bucket_name_out – name of the S3 bucket where to save analysis result object, defaults to None

  • prefix_object_out – name of the S3 bucket for the analysis result object,

Returns:

job identifier

Input object can be in JPEG, PNG or PDF format. Documents should be located in the Amazon S3 bucket.

By default Amazon Textract will save the analysis result internally to be accessed by keyword Get Document Analysis. This can be overridden by giving parameter bucket_name_out.

start_document_text_detection(bucket_name_in: str | None = None, object_name_in: str | None = None, object_version_in: str | None = None, bucket_name_out: str | None = None, prefix_object_out: str = 'textract_output')

Starts the asynchronous detection of text in a document. Amazon Textract can detect lines of text and the words that make up a line of text.

Parameters:
  • bucket_name_in – name of the S3 bucket for the input object, defaults to None

  • object_name_in – name of the input object, defaults to None

  • object_version_in – version of the input object, defaults to None

  • bucket_name_out – name of the S3 bucket where to save analysis result object, defaults to None

  • prefix_object_out – name of the S3 bucket for the analysis result object,

Returns:

job identifier

Input object can be in JPEG, PNG or PDF format. Documents should be located in the Amazon S3 bucket.

By default Amazon Textract will save the analysis result internally to be accessed by keyword Get Document Text Detection. This can be overridden by giving parameter bucket_name_out.

upload_file(bucket_name: str | None = None, filename: str | None = None, object_name: str | None = None, **kwargs) tuple

Upload single file into bucket

Parameters:
  • bucket_name – name for the bucket

  • filename – filepath for the file to be uploaded

  • object_name – name of the object in the bucket, defaults to None

Returns:

tuple of upload status and error

If object_name is not given then basename of the file is used as object_name.

note This keyword accepts additional parameters in key=value format (see below code example).

More info on additional parameters.

Robot Framework example:

&{extras}=    Evaluate    {'ContentType': 'image/png'}
${uploaded}    ${error}=    Upload File
...    mybucket
...    ${CURDIR}${/}image.png
...    image.png
...    ExtraArgs=${extras}
upload_files(bucket_name: str | None = None, files: list | None = None, **kwargs) list

Upload multiple files into bucket

Parameters:
  • bucket_name – name for the bucket

  • files – list of files (2 possible ways, see above)

Returns:

number of files uploaded

Giving files as list of filepaths:

[‘/path/to/file1.txt’, ‘/path/to/file2.txt’]

Giving files as list of dictionaries (including filepath and object name):

[{‘filename’:’/path/to/file1.txt’, ‘object_name’: ‘file1.txt’}, {‘filename’: ‘/path/to/file2.txt’, ‘object_name’: ‘file2.txt’}]

note This keyword accepts additional parameters in key=value format (see below code example).

More info on additional parameters.

Python example (passing ExtraArgs):

upload_files = [
    {
        "filename": "./image.png",
        "object_name": "image.png",
        "ExtraArgs": {"ContentType": "image/png", "Metadata": {"importance": "1"}},
    },
    {
        "filename": "./doc.pdf",
        "object_name": "doc.pdf",
        "ExtraArgs": {"ContentType": "application/pdf"},
    },
]
awslibrary.upload_files("mybucket", files=upload_files)