Follow

Deduplication

Created by: Tod Ewasko
Created date:
Last Updated date:

Questions

Why is this document marked as a duplicate?

How do I see duplicate documents?

How are emails deduplicated?

 

Answers

Accessdata deduplicates email and attachment records according to their family. This means that the same PDF attached to two different emails will not be marked as a duplicate unless the emails are also duplicates.

Deduplication for files outside of email types occurs by hashing the document. This process generates an MD5Hash that can be compared to see if the document is different. A single character difference will result in different MD5Hash.

Deduplication for emails uses your processing options shown below in Summation:

They will deduplicate MSGs against Emails within PSTs, MSGs against other MSGs all using the settings above.

Note: Submit and Delivery times are evaluated as duplicates down to the 10 millionth of a second.

Email Types:

  • MSG
  • PST
  • NSF
  • EML
  • AOL
  • DBX
  • MBOX
  • Other

To view the duplicates in a case you will need to choose your Options -> Quick Filters -> Show Duplicates:

Then add DeduplicateType as a column. This column will be populated with one of 3 values:

  • Primary - This means there is a duplicate of this document in the database
  • Secondary - This is a duplicate and will be filtered out when Hide Duplicates is on
  • (Blank) - This document has no duplicates in the database

To find out which objects were flagged as duplicates of each other, review the Deduplication reports found on the reporting tab. You will find the ObjectID and the Primary ObjectID in this report.

 

 

Overview

 

Deduplication is a powerful tool and understanding the points above will help you quickly answer questions relating to this area.

Was this article helpful?
1 out of 4 found this helpful
Have more questions? Submit a request

Comments

  • Avatar
    Beth Kaufman

    Thank you. This is very helpful.

Powered by Zendesk