Learning SYGR

Automatic document posting based on PDF file

In this tutorial we build a real life scenario, which you can use practically as it is in your business environment.

The Scenario

  • SYGR receives a PDF document from a business partner via a B2B interface. The document can be any format, simple or complex, important, that it contains enough information for being a Purchase Order.
  • We prepare a Web Service based on Python and Flask, which extracts the text content of this document, as it found.
  • We send the PDF file to this service and receive the text.
  • Then we prepare a few shot prompt for GPT, requesting it to grab out only the relevant information and send it back in a predefined format.
  • Using this we post the actual Purchase Order in SYGR and notify the Sales Department by e-mail.

Tutorial

During building this scenario, we learn how to

  • (well, to build a Python/Flask app to extract text from a PDF. This is not really a SYGR task, but we need it.)
  • recive a binary file and store it in SYGR so it is available for other SYGR servers
  • read back this file in another server
  • send an HTTP request to the previously defined Web service
  • build a prompt for GPT to extract exactly the information we need and send us in exactly in the format we require
  • send an e-mail

Web service with Python and Flask

To create the necessare service, is really very simple.
We create a file called pdfconv.py:

from flask import Flask, request
from PyPDF2 import PdfReader
from io import BytesIO

app = Flask(__name__)

@app.route('/pdfconv', methods=['GET','POST'])
def convert_file():

 simplebytes = request.data
 file = BytesIO(simplebytes)

# try to convert to text
  try:
   pdf_reader=PdfReader(file, strict=False)
   text=""
   for page in pdf_reader.pages:
    text+=page.extract_text()
   
   return text
   
  except Exception as e:
   print(e)
   return "Error reading the PDF file."

What happens here:

  • we read the binary file from request.data
  • this is a bytes object, which is not suitable for PyPDF, so we convert it to a file-like object
  • open it in PyPDF and
  • extract the text page by page.

You can download the code from here.

In a productive business environment you need to deploy it either in a cloud provider, or on one of your on premise hosts e.g. as a Docker container.
In this tutorial we simply run it from the command prompt

Tutorial

Message Converter

You know already from the previous tutorials, that in order to receive any HTTP requests in SYGR, we need to define a Message Converter Plugin.
In this case it will be an asynchronous, file plugin.
The file API of SYGR requires a Multipart form data media type input with only one item, where

  • the key of the item is "file"
  • and the content is the file itself
Like in this Postman example:

Tutorial

It then has to be sent to the file API, using URL

http(s)://[host]:[port]/[SYGR server name]/services/Rest/file/async/[plugin name]/default

Get the binary file

The Plugin API provides the file in

data.messageIn.getTransfer().get(0)

as a simply Object, so we need to convert it to byte[]:

if(data.messageIn.getTransfer() == null || data.messageIn.getTransfer().size() == 0) {
 util.log("ErpReceiveDocument ERROR: no data in msgin.Transfer");
 return;
}

byte[] pdf = ( byte[] ) data.messageIn.getTransfer().get(0);
if(pdf.length == 0) {
 util.log("ErpReceiveDocument ERROR: file length 0, stop.");
 return;
}

Store the file in SYGR

SYGR provides the possibility to store (even large) binary files in the SYGR databases.
Here we call these Attachments.

In the previous tutorial we have learned the role of the Store Types.
Store Types are a logical grouping of SYGR and database servers.
If we save something into a Store Type, that will be available

  • in every database server linked to the Store Type and
  • in every SYGR Store Server of this Store Type.
As the HTTP messages arrive to a Catcher server, which is not part of any Store Type, first we need to find out, where to save this file.
The easyest way is, if we know the name of at least one Entity Model in this Store Type, to ask the system.
For our scenario we have created the Entity Model "ErpProtoPo" as temporary storage of the incoming request, so we get the Store Type of this:

String storetype = util.getStoreTypeOfPotType(Constants.ENTPROTOPO);

The we create an Attachment Object, add the file to it and save:

Attachment atta = new Attachment();
atta.setContent(pdf);
String attaguid = util.createAttachment(storetype, atta);

The save method returns a unique ID of this attachment, which we store in the Creator we send to asynchronous processing. This way the Entity Init Plugin (most probably in another server running on another host) will find it.
We also set the name of the Entity Plugin, this will be executed automatically before the Entity is saved.

creator.setInitplugin(Constants.INITPLUGINPO);

Entity Init

The Entity Init Plugin runs after the prototype of the Entity arrived from the Catcher Server to one of the Store Servers, before actually saving is into the database.
Prerequisite is, that the field initPlugin is filled with the name of the required Plugin in the Creator Object within the MessageConverter.

In our case the Entity Init is doing quite a lot of work, let's see step by step.

Getting back the file

First of all we need to get back the PDF file which we have saved as Attachment in the Message Converter.
For this, we first need to read the GUID of the attachment from the Entity and then read the Attachment.
To be nice, we also delete the already unnecessary Attachment from the database, and convert the content of it to our PDF file.
Remarks:

  • You can see in the code, that we do not use the Store Type to get the Attachment. The reason is, that here we are in a Store Server, which is uniquely linked to one Store Type, so we even have no chance to read from somewhere else (well, yes, we have...).
  • The whole thing we are doing in the Entity Init Plugin, we could do already in the Message Converter. But then how can you learn the handling of the Attachments?

Attr attr = data.pot.getFlexi();
String attaguid = util.getNodeValue(attr, Constants.ATTAGUID);

Attachment atta = util.readAttachmentByGuid(attaguid);
util.deleteAttachment(attaguid);

byte[] pdf = atta.getContent();

Calling the Web Service

We have prepared a nice Web Service, which receives a PDF and extracts its text content.
Now it's time to use it.
For this we

  • read the URL of the service from a Business Parameter (here we use a Util method)
  • prepare some HTTP headers
  • and call the SYGR HTTP request sending

String uri = Util.getPdfTxtUrl(util);

ArrayList<HttpHeaderValue> headers = new ArrayList<>();
HttpHeaderValue header1 = new HttpHeaderValue();
header1.headerName = "Content-type";
header1.headerValue = "application/pdf";
headers.add(header1);

ArrayList<Object> result = util.execCommand(
 "HTTP", "BINARY", "TEXT", uri, headers, pdf);

We have seen this util.execCommand() earlier, it's a Jolly Joker for command execution.
In our case we call the HTTP request sender. The parameters here:

Parameter positionPurposeContent
1Main Command"HTTP" for HTTP request sending
2Request format"BINARY" for a direct binary file sending
3Expected response format"TEXT" for any text type response
4URLRequest URL
5HTTP headersThe headers we want to send
6BODY contentIts format depend on the request type, in this case simply the PDF file in a byte[]

The response String we get simply as the 1st Object in result, but we need to convert it to an actual String object.

Object retobj = result.get(0);
String pdfcontent = ( String ) retobj;
pdfcontent = pdfcontent.replaceAll(System.lineSeparator(), "\\\\n");

The 3rd line in this code looks strange. Why do we need it?
The reason is, that the GPT API very, but very much hates New Line characters.
It actually dies at once it receives one (not the complete OpenAI company, but your request).
However later, the SYGR GPT interface also takes care of it, never hurts to be careful.

Prompt Generation

The next step is, that we create a nice, so called Few Shot Prompt for GPT. In this

  • we pass the text of the PDF to it
  • tell exactly what information it has to extract from it
  • and send back in which format.
Our prompt will look something like this:

Text: [the PDF text]\\n
You are a professional company sales representative.
From the above purchase order text extract the following information:
the customer name, company name, street address, city, ZIP code
and from every item the material name and the quantity.
Do not include any explanations, only provide the answer like:\\n
name = the name of the customer\\n
company = the name of the company\\n
street = the street address\\n
city = the city\\n
zip = the ZIP code\\n
material = the material; quantity = the quantity;\\n
material = the material; quantity = the quantity;\\n
material = the material; quantity = the quantity;\\n
material = the material; quantity = the quantity;\\n

We request the answer in an easy machine readable text format.
We could use anything else, but by our experience JSON can be problematic, because GPT itselfs sends the response as JSON, and then our JSON will be a field value within that JSON, which confuses the parsers.
But feel free to experience.

With this prompt we just call GPT:

ArrayList<Object> alo = util.execCommand("AI", "CHATGPT", "SIMPLE", "aigpt", prompt);
String answer = ( String ) alo.get(1);

Please see, that we read the second returned object.
The first one (alo.get(0)) is the complete JSON answer from GPT, which also can be useful sometimes. The 2nd is just the answer.

Completing the work

Not too much left. We just

  • create and send an e-mail
  • create the Purches Order data based on the GPT answer
  • and change the Entity Model to the final one.

ExtAlert mail = new ExtAlert();
...
util.sendAlert(mail);

Attr attr1 = Util.getPoAttr(answer, util);
data.pot.setFlexi(attr1);
data.pot.setType(Constants.ENTERPPO);

Configuration

In the SYGR configuration there is nothing new.
We need 2 Entity Models: one is used by the Message Converter, the other is the final Model set by the Entity Init.

Tutorial

(Do not forget to link both to a Store Type!)

And we also need two Plugin definitions, one for the Message Converter and one for the Entity Init.

Tutorial

(And here do not forget to link the Message Converter Plugin to a Catcher Server!)

Let's see it working!

We are done. It was not so easy as the previous tutorials, but the result shows it was worth.

We have prepared a not too simple PDF document for our Purchase Order:

Tutorial

We send it to SYGR from Postman:

Tutorial

We have received an e-mail:

Tutorial

We can open the document in SYGR:

Tutorial

Downloadables

You can download the different codes:

Conclusion

We have arrived to the end of this tutorial.
We have learned how to convert an incoming PDF file to an actual document using different techniques (own Web service, GPT).
See you in our next tutorial, where we go into more details of the SYGR Automation System.

If you have questions, please contact us:
contact@sygr.ch
contact@sles-automation.com
+41 79 470 67 84