Python API

Nanonets

class RPA.DocumentAI.Nanonets.Nanonets

Library to support Nanonets service for intelligent document processing (IDP).

Library requires at the minimum rpaframework version 19.0.0.

Service supports identifying fields in the documents, which can be given to the service in multiple different file formats and via URL.

Robot Framework example usage

*** Settings ***
Library   RPA.DocumentAI.Nanonets
Library   RPA.Robocorp.Vault

*** Tasks ***
Identify document
    ${secrets}=   Get Secret  nanonets-auth
    Set Authorization    ${secrets}[apikey]
    ${result}=    Predict File
    ...  ${CURDIR}${/}files${/}eckero.jpg
    ...  ${secrets}[receipts-model-id]
    ${fields}=    Get Fields From Prediction Result    ${result}
    FOR    ${field}    IN    @{fields}
        Log To Console    Label:${field}[label] Text:${field}[ocr_text]
    END
    ${tables}=    Get Tables From Prediction Result    ${result}
    FOR    ${table}    IN    @{tables}
        FOR    ${rows}    IN    ${table}[rows]
            FOR    ${row}    IN    @{rows}
                ${cells}=    Evaluate    [cell['text'] for cell in $row]
                Log To Console    ROW:${{" | ".join($cells)}}
            END
        END
    END

Python example usage

from RPA.DocumentAI.Nanonets import Nanonets
from RPA.Robocorp.Vault import Vault

secrets = Vault().get_secret("nanonets-auth")
nanolib = Nanonets()
nanolib.set_authorization(secrets["apikey"])
result = nanolib.predict_file(file_to_scan, secrets["receipts-model-id"])
fields = nanolib.get_fields_from_prediction_result(result)
for field in fields:
    print(f"Label: {field['label']} Text: {field['ocr_text']}")
tables = nanolib.get_tables_from_prediction_result(result)
for table in tables:
    rpatable = Tables().create_table(table["rows"])
    for row in table["rows"]:
        cells = [cell["text"] for cell in row]
        print(f"ROW: {' | '.join(cells)}")
ROBOT_LIBRARY_DOC_FORMAT = 'REST'
ROBOT_LIBRARY_SCOPE = 'GLOBAL'
get_all_models() Dict

Get all available models related to the API key.

Returns

object containing available models

Robot Framework example:

${models}=  Get All Models
FOR  ${model}  IN  @{models}
    Log To Console  Model ID: ${model}[model_id]
    Log To Console  Model Type: ${model}[model_type]
END

Python example:

models = nanolib.get_all_models()
for model in models:
    print(f"model id: {model['model_id']}")
    print(f"model type: {model['model_type']}")
get_fields_from_prediction_result(prediction: Optional[Union[Dict[Hashable, Optional[Union[str, int, float, bool, list, dict]]], List[Optional[Union[str, int, float, bool, list, dict]]], str, int, float, bool, list, dict]]) List

Helper keyword to get found fields from a prediction result.

For example. see Predict File keyword

Parameters

prediction – prediction result dictionary

Returns

list of found fields

get_tables_from_prediction_result(prediction: Optional[Union[Dict[Hashable, Optional[Union[str, int, float, bool, list, dict]]], List[Optional[Union[str, int, float, bool, list, dict]]], str, int, float, bool, list, dict]]) List

Helper keyword to get found tables from a prediction result.

For another example. see Predict File keyword

Parameters

prediction – prediction result dictionary

Returns

list of found tables

Robot Framework example:

# It is possible to create ``RPA.Tables`` compatible tables from the result
${tables}=    Get Tables From Prediction Result    ${result}
FOR    ${table}    IN    @{tables}
    ${rpatable}=    Create Table    ${table}[rows]
    FOR    ${row}    IN    @{rpatable}
        Log To Console    ${row}
    END
END

Python example:

# It is possible to create ``RPA.Tables`` compatible tables from the result
tables = nanolib.get_tables_from_prediction_result(result)
for table in tables:
    rpatable = Tables().create_table(table["rows"])
    for row in rpatable:
        print(row)
ocr_fulltext(filename: str, filepath: str) List

OCR fulltext a given file. Returns words and full text.

Filename and filepath needs to be given separately.

Parameters
  • filename – name of the file

  • filepath – path of the file

Returns

the result in a list format

Robot Framework example:

${results}=  OCR Fulltext
...   invoice.pdf
...   ${CURDIR}${/}invoice.pdf
FOR  ${result}  IN  @{results}
    Log To Console  Filename: ${result}[filename]
    FOR  ${pagenum}  ${page}  IN ENUMERATE  @{result.pagedata}   start=1
        Log To Console  Page ${pagenum} raw Text: ${page}[raw_text]
    END
END

Python example:

results = nanolib.ocr_fulltext("IMG_8277.jpeg", "./IMG_8277.jpeg")
for result in results:
    print(f"FILENAME: {result['filename']}")
    for page in result["page_data"]:
        print(f"Page {page['page']+1}: {page['raw_text']}")
predict_file(filepath: str, model_id: str) Optional[Union[Dict[Hashable, Optional[Union[str, int, float, bool, list, dict]]], List[Optional[Union[str, int, float, bool, list, dict]]], str, int, float, bool, list, dict]]

Get prediction result for a file by a given model id.

Parameters
  • filepath – filepath to the file

  • model_id – id of the Nanonets model to categorize a file

Returns

the result in a list format

Robot Framework example:

${result}=  Predict File  ./document.pdf   ${MODEL_ID}
${fields}=    Get Fields From Prediction Result    ${result}
FOR    ${field}    IN    @{fields}
    Log To Console    Label:${field}[label] Text:${field}[ocr_text]
END
${tables}=    Get Tables From Prediction Result    ${result}
FOR    ${table}    IN    @{tables}
    FOR    ${rows}    IN    ${table}[rows]
        FOR    ${row}    IN    @{rows}
            ${cells}=    Evaluate    [cell['text'] for cell in $row]
            Log To Console    ROW:${{" | ".join($cells)}}
        END
    END
END

Python example:

result = nanolib.predict_file("./docu.pdf", secrets["receipts-model-id"])
fields = nanolib.get_fields_from_prediction_result(result)
for field in fields:
    print(f"Label: {field['label']} Text: {field['ocr_text']}")
tables = nanolib.get_tables_from_prediction_result(result)
for table in tables:
    for row in table["rows"]:
        cells = [cell["text"] for cell in row]
        print(f"ROW: {' | '.join(cells)}")
set_authorization(apikey: str) None

Set Nanonets request headers with key related to API.

Parameters

apikey – key related to the API

Robot Framework example:

${secrets}=   Get Secret  nanonets-auth
Set Authorization    ${secrets}[apikey]

Python example:

secrets = Vault().get_secret("nanonets-auth")
nanolib = Nanonets()
nanolib.set_authorization(secrets["apikey"])