Python API

PDF

class RPA.PDF.PDF

PDF is a library for managing PDF documents.

It can be used to extract text from PDFs, add watermarks to pages, and decrypt/encrypt documents.

Merging and splitting PDFs is supported by Add Files To PDF keyword. Read the keyword documentation for examples.

There is also limited support for updating form field values. (check Set Field Value and Save Field Values for more info)

The input PDF file can be passed as an argument to the keywords, or it can be omitted if you first call Open PDF. A reference to the current active PDF will be stored in the library instance and can be changed by using the Switch To PDF keyword with another PDF file path, therefore you can asynchronously work with multiple PDFs.

Attention

Keep in mind that this library works with text-based PDFs, and it can’t extract information from an image-based (scan) PDF file. For accurate results, you have to use specialized external services wrapped by the RPA.DocumentAI library.

Portal example with video recording demo for parsing PDF invoices: https://github.com/robocorp/example-parse-pdf-invoice

Examples

Robot Framework

*** Settings ***
Library    RPA.PDF
Library    String

*** Tasks ***
Extract Data From First Page
    ${text} =    Get Text From PDF    report.pdf
    ${lines} =     Get Lines Matching Regexp    ${text}[${1}]    .+pain.+
    Log    ${lines}

Get Invoice Number
    Open Pdf    invoice.pdf
    ${matches} =  Find Text    Invoice Number
    Log List      ${matches}

Fill Form Fields
    Switch To Pdf    form.pdf
    ${fields} =     Get Input Fields   encoding=utf-16
    Log Dictionary    ${fields}
    Set Field Value    Given Name Text Box    Mark
    Save Field Values    output_path=${OUTPUT_DIR}${/}completed-form.pdf
    ...                  use_appearances_writer=${True}

from RPA.PDF import PDF
from robot.libraries.String import String

pdf = PDF()
string = String()

def extract_data_from_first_page():
    text = pdf.get_text_from_pdf("report.pdf")
    lines = string.get_lines_matching_regexp(text[1], ".+pain.+")
    print(lines)

def get_invoice_number():
    pdf.open_pdf("invoice.pdf")
    matches = pdf.find_text("Invoice Number")
    for match in matches:
        print(match)

def fill_form_fields():
    pdf.switch_to_pdf("form.pdf")
    fields = pdf.get_input_fields(encoding="utf-16")
    for key, value in fields.items():
        print(f"{key}: {value}")
    pdf.set_field_value("Given Name Text Box", "Mark")
    pdf.save_field_values(
        output_path="completed-form.pdf",
        use_appearances_writer=True
    )

ROBOT_LIBRARY_DOC_FORMAT = 'REST'

ROBOT_LIBRARY_SCOPE = 'GLOBAL'

add_library_components(library_components: List, translation: dict | None = None, translated_kw_names: list | None = None)

get_keyword_arguments(name)

get_keyword_documentation(name)

get_keyword_names()

get_keyword_source(keyword_name)

get_keyword_tags(name)

get_keyword_types(name)

run_keyword(name, args, kwargs=None)