Search

Articles

If the document has tables, Doc Reader automatically identifies the tabular structures and extracts the contents as tables in the right-hand side panel under Line-Item details.

  1. Verify the columns predicted are correct.
  2. Click the Image description icon. Image description

    • Select the End of the table option. Select text just below the table from left panel, to be marked as End of Table.
    • When the Doc Reader identifies lines to separate columns, the Enable column-line based processing toggle button is enabled.
    • When the Doc Reader identifies lines to separate rows, the Enable row-line based processing toggle button is enabled.
    • You can override the above options and use Row and Column Definition option to identify the rows and columns of the table.
    • If there are tables across multiple pages in the Document, use the options Headers at starting page only, Stop at the first end of table, or Data Capture Rule as required.
  3. Click the Update Data button. Table data is extracted from the PDF and populated in the table in the right panel.

If both Row and Column Definition and Detect Rows based on line separation options are enabled, row-based line separation is considered first. If Doc Reader fails to identify lines, Row and Column Definition is used.

Table across Multiple Pages

  • For documents where table extends across multiple pages, but table headings occur only at the start of table, enable Headers at starting page only option to extract the table data from whole document. Image description

  • For documents where table extends across multiple pages, disable the Stop at the first end of table option if you wish to extract contents from all the pages.

    • If any of the table header structure is different, that table is excluded from extraction.
    • Irrelevant data present in between the tables is excluded. Image description

By default, Stop at the first end of table option is enabled and table content from the table header to first end of table is extracted.

  • Use the Data Capture Rule option to choose the desired table to be extracted.

    • You can choose the occurrence of the table from the drop-down.
      OR
    • Select a data point so that the table to be extracted comes after this reference label.
    • Click the SAVE button. Once the settings are saved, it applies to all documents of the same category.

Image description

Ensure that you are selecting the datapoint above the table to be extracted.

Row and Column Definition

If the rows and columns in the table of the document are not aligned properly, Doc Reader cannot identify the rows of the table correctly.

You can use Row and Column Definition to identify the rows and columns for extraction. Based on the parameters provided, the rows are marked, and Doc Reader identifies the row to be extracted.

  1. Click the Image description icon and select the Row and Column Definition option. The Row and Column Definition window is displayed in the right panel. Image description

    1. Key Column : Any column that is properly aligned can be selected as the reference or row marker.
    2. Alignment : You can select Top or Bottom from the drop-down.
      • Top : Row marker starts from above the text in the Key Column record and extends to the top of the text in the next record.
      • Bottom : Row marker starts from the bottom of the text in the Key Column record and extends to the bottom of the text in the previous record.
    3. Column Starts After : Defines the offset for the row marker. This is used when the data in the columns are misaligned with the Key Column data.

      Depending on the position of data in the table, the row lines are automatically captured. If the row lines are not separating the rows correctly, you can use this option to define the exact location of the row separator.

  2. Provide the values for the required parameters Key Column, Alignment , Column Starts After based on the alignment of data in the table

  3. Click the Update Data button. The rows are identified, and the data is updated in the table.

In the below PDF, the Quantity column is selected as the Key Column field; Top is selected in the Alignment field.
Row marker starts from the top of each text in the Quantity column and extends to the top of the text in next record.
The rows are correctly identified, and the Description column is also correctly displayed.

Image description

Change the Alignment field to Bottom. Row marker starts from the bottom of each text in the Quantity column and extends to the bottom of the text in the previous record. The rows are identified, but the Description column is misaligned. So, for the below PDF Alignment must be selected as Top.

Image description

Did you find what you were looking for?