When processing PSTs, what are some best practices? What is a PST? Why is Summation Pro not showing the original time zone of email?
Microsoft's PST (Personal Folder Storage) container is a database that stores both metadata for emails as well as embedded objects or attachments. Microsoft stores the data in records for emails similar to the way an excel spreadsheet contains data. Therefore, an email in a PST is not actually a file but rather a combination of fields.
Dates and Time
Time fields are stored in the PST in a UTC format, which is how we collect the data. When you open a PST in Outlook, it looks at your regional settings on your system and converts to the local time zone from the originally stored value.
Note: in 5.6.1 you are able to set this display time zone for the grid, images and viewer display at case creation.
PST Data loading
When loading PSTs, they have a tendency to corrupt their structure with a simple file copy on the network. This can be fixed using the Microsoft tool ScanPST mentioned here: http://support.microsoft.com/kb/287497/.
The link below describes a local copy of "scanpst.exe" that you may have access to.
If time permits, running ScanPST over all PSTs ingested prior to loading into AD software is recommended. This will prevent problems which will manifest themselves in many areas:
- Natural Viewer not displaying objects from within the PST
- Filtered text not extracted for objects in the PST
- PSTs failing to reduce during production or export
- PSTs failing to expand
MD5Hash is not performed on an email within a PST, as a design choice. Each email record in a PST has a unique SearchKey value. This value changes when you copy the same email into another PST. SearchKey value is only unique to the PST it is in. Because of this field, if you MD5Hash the whole email you will not deduplicate all identical emails across PSTs. Deduplication on emails occurs via the fields chosen on the processing settings. For more information on deduplication Click Here
Emails in a PST are comparable to records in a database so when they are ingested the file extension field for emails is not populated. The file extension field will be populated for attachments that have extensions.