OCR can be a time-consuming process if the settings are not properly set. The first recommendation would be to OCR on-the-fly as you need it, and not turn it on during processing. If this is not something that will work for you, then see below for some processing options that will prevent you from running into OCR processing that seems to run forever.
OCR Settings in AD Ediscovery / Summation Pro.
- Do not OCR documents under 5120 bytes – This option is to prevent small files from going through the OCR engine. An example would be email signature images is what this option is targeting.
- Do not OCR documents over 10485760 bytes – This option is to prevent large files from going through the OCR engine. This prevents your job from waiting for massive PDFs to finish OCR which will hold up your whole job.
- Do not OCR Full color documents – This option is here to prevent photos from going through the OCR engine.
- Do not OCR if Text Size is over 5120 bytes – This option is to prevent PDFs that already have extracted text from going through the OCR engine.
Notes: Using these options above, you would be excluding documents; this is meant to prevent you from getting into trouble areas with the OCR engine. The two most important options for PDFs are 2 and 4, to keep you reliably processing and preventing a massive PDF from stopping your evidence load.
Same settings in AD Lab/FTK.