PDF Association logo.



Status: Partner Member
Country: US
Sector: All industries
Joined at: Feb 08
Website: http://www.datalogics.com/

Linked User
Maryanne Pavlin
Matt Kuznicki
Nicki Bullock
Vel Genov
Emma Kaschke
Leonard Ho

Repackaging a PDF Portfolio using the Datalogics PDF Java Toolkit

Sample of the Week:

I have a real love/hate relationship with PDF Portfolios. On the one hand, they are a brilliant way to package multiple files, PDF or otherwise, in a single secure package and send them around. On the other hand, you can’t rely on the recipients having a consistent experience… even across the Adobe viewers. It’s really annoying. PDF’s popularity and longevity are due in large part to its ability to reliably communicate documents across platforms and viewers and maintain the visual fidelity; it looks like what the author intended. Even in the worst PDF viewers, the visual fidelity is preserved even if you can’t work with a form or comment on the file or add the signature that you were requested to add. But there is a way out… PDF Portfolios are really just an extension of the old style PDF Packages which were much simpler but far more consistent in their behavior.

What You Need to Know First:

PDF Packages were introduced in Acrobat 8.0 and they themselves were an extension to simple file attachments in a PDF file. With PDF Packages, you could combine multiple PDF files into a single “Package” with each file remaining a discrete entity with it’s own security settings but you could also combine Dynamic XFA Forms and static PDF Forms (Acroforms). What distinguished PDF Packages from ZIP files is that you could add descriptive text and metadata in context and you’d get a little thumbnail to help you find the specific files you wanted across the whole set.

In Acrobat 9.0, PDF Packages grew up into PDF Portfolios. PDF Portfolios built on top of the architecture for attachments and PDF Packages so that they would “fail gracefully” in older viewers or viewers that didn’t support Portfolios. Depending on the viewer, the PDF Portfolio would behave like a PDF Package or just a PDF with a list of attachments in it.

In Acrobat X, PDF Portfolios evolved yet again to incorporate Flash-based “Layouts” that help the user navigate the collection of files. But they still fail gracefully in viewers that don’t support Portfolios.

But… if you do have a viewer that supports these Flash-based Layouts, the Portfolio wants you to install the Flash Player… and not just the regular Flash Player… you need to install two Flash Players.

If your system isn’t configured properly, the user experience gets really bad, really fast… and for users that are conditioned to PDF being easy and reliable, it’s frustrating.

So, in order to create a more consistent experience, a developer might want to “dumb-down” their PDF Portfolios and make them behave more like a PDF file with a list of attachments. Because PDF Portfolios were evolutionary rather than revolutionary, this ends up being pretty simple using theDatalogics PDF Java Toolkit.

Repackaging a PDF Portfolio:

In order to repackage a PDF Portfolio, turning it back into just a PDF file with a list of file attachments, the only thing you really need to do is remove the Collection dictionary from the catalog.

(click on images to enlarge)


That’s it. We’re done.

Ok… technically, that’s all you need to do but there are a few other modifications you might want to make that will create a better user experience… which is the point of this exercise after all.

Because you want the user to be able to navigate the file attachments, you will want to set the PageMode in the file to attempt to open the attachments panel if the viewer supports it; some do, some don’t but it’s very convenient for the end user if the panel opens up automatically.


Next we’ll replace the default “Cover Sheet” that gets added to the PDF Portfolio when it’s created. This is the page that gets displayed when the viewer can’t process a PDF Portfolio and show it using it’s embedded layout. Since the file is no longer a PDF Portfolio, the text on the cover sheet no longer applies. The most efficient way to do this is to append a new page to the PDF Document and then delete the first one.


And finally, if the original author added some descriptive text to the items in the Portfolio, it might be a good idea to migrate that information to the file attachment description metadata. The PDF Portfolio templates that come with Adobe Acrobat contain a “Summary” field that is generally used to enter a description of the file attachment. In this section we copy that data to the Description metadata field for the attachment so it can be seen in the Acrobat Attachments panel.


After making these changes, your file still won’t behave exactly the same in every PDF viewer but it will behave more consistently and will definitely behave the same across Adobe viewers and Adobe viewer versions, with or without Flash installed.

To get started working with PDF, download this Gist and request an evaluation copy of The Datalogics PDF Java Toolkit.

Related Products
Adobe PDF Library

The Adobe PDF Library SDK is a low-level PDF library that contains a powerful set of native C/C++ APIs with interfaces for .NET and Java APIs. Systems integrators, independent software vendors (ISVs), enterprise IT developers, and others can integrate Adobe PDF functionality within custom applications in a client and / or server environment.

PDF Java Toolkit

Datalogics PDF Java Toolkit is a native Java library that provides high-level APIs for automating PDF workflows like processing PDF forms, verifying digital signatures, and extracting text. It also offers low-level APIs for working directly with the structure of the PDF for those times you need it.

Adobe PDF Converter

Adobe Normalizer, is an API which allows developers to quickly and easily convert Encapsulated PostScript (EPS) and PostScript (PS) files to Adobe’s Portable Document Format (PDF). The industry-standard Adobe Distiller and Distiller Server are themselves built upon PDF Converter SDK; and now this API is available separately to application developers.

Adobe PDF Print Engine

The Adobe PDF Print Engine is a common rendering engine technology, packaged as a software development kit (SDK). It can be the basis for a variety of products for previewing and printing Adobe Portable Document Format (PDF) documents at different stages of the professional print workflow.


Datalogics PDF2IMG is a command-line utility that converts PDF files to a variety of image formats including PNG, JPG, TIFF, BMP, and more. It is built upon the Adobe PDF Library and uses Adobe technology for unrivaled color management during the PDF conversion process

PDF Alchemist

Datalogics PDF Alchemist is a new (C/C++) SDK for intelligently extracting text and images from PDFs and exporting to HTML 5 or EPUB. It employs sophisticated techniques to identify and reconstruct “text flows” within the PDF.