Einstein OCR provides OCR (optical character recognition) models that detect alphanumeric text in an image. In other words, Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo.
We can use this capability to perform many task inside salesforce. For example, We got form images, and we can use Einstein OCR to parse text. We can use that data to create record in Salesforce. OR we scan the customer card and get information from that. So any similar task, we can perform using this API.
When we call the API, we send in an image. and the JSON response contains various elements based on the value of the task parameter:
- String of alphanumeric characters that the model predicts.
- Confidence (probability) that the detected bounding box contains text.
- XY coordinates for the location of the character string within the image (also called a bounding box).
- For tabular data, the table row and column in which the text is located.
- For business cards, the entity type of the detected text such as ORG, PERSON, and so on.
Einstein OCR supports three modals:
Detect Text In Image
To make any API request, Firstly we need to generate Access token. So you can find detail steps for generating token here.
Once we get the token we are ready to make our request. And for demo purpose I have used below image.
This is the sample request which we have used.
HttpRequest req = new HttpRequest(); req.setMethod('POST'); req.setEndpoint('https://api.einstein.ai/v2/vision/ocr'); req.setHeader('content-type', 'multipart/form-data; charset="UTF-8"; boundary="1ff13444ed8140c7a32fc4e6451aa76d"'); req.setHeader('Authorization', 'Bearer '+access_token); //replace token with your access token req.setHeader('Cache-Control', 'no-cache'); string form64 = ''; form64 += HttpFormBuilder.WriteBoundary(); form64 += HttpFormBuilder.WriteBodyParameter('sampleLocation', 'https://newstechnologystuff.com/wp-content/uploads/2018/10/untitled-design.png'); form64 += HttpFormBuilder.WriteBoundary(); form64 += HttpFormBuilder.WriteBodyParameter('modelId', 'OCRModel'); form64 += HttpFormBuilder.WriteBoundary(HttpFormBuilder.EndingType.CrLf); blob formBlob = EncodingUtil.base64Decode(form64); string contentLength = string.valueOf(formBlob.size()); req.setBodyAsBlob(formBlob); req.setHeader('Connection', 'keep-alive'); req.setHeader('Content-Length', contentLength); req.setTimeout(60*1000); Http h = new Http(); String resp; HttpResponse res = h.send(req);
We have passed two parameters here.
sampleLocation in which we will pass the image URL and second
modalId. In addition, we have used OCRModel, this modal is provided by Salesforce OOTB.
Below is the response we get:
So in the response, We get three parameter: Label, probability and location coordinates.
Detect Text in Business Cards
To return text for a business card, specify a modelId of
task parameter of
contact. In addition to the text, the model returns the entity type for each text element that it detects.
In the response we get one extra tag “attributes” which will help us in decide the entity type.
Detect Text and Tables
Sometimes our image may contain Tables. So to return the table data for each text element, In addition to the other parameter, we need to specify the
tabulatev2 model and a
task parameter value of
So we have covered three methods. And you can find the complete code here.
We can certainly use this in many usecases. So let me know how you are planning to use this in comments. Happy Programming 🙂