Search optimization for PDF documents
While the Digital.gov’s Web Managers Community has recommended that website managers reduce or remove PDFs from government websites, there are some cases where this is not possible. When a PDF is necessary, you need to take special care to ensure that it can be found.
PDFs are not the ideal document type for driving search engine optimization and often fall lower in the ranking algorithm. This is because much of the SEO value for any file is derived from metadata inserted into the file. And for PDFs, this metadata must be created in each document file using a program such as Adobe Acrobat, and is a step that many often overlook. (In addition, non-HTML documents also fall lower in our ranking algorithm than HTML documents.)
Similar to web pages, Search.gov relies on structured metadata in your PDF files to present them in search results.
By following these suggestions in preparing your PDF files, you will improve the quality of the data in our index and the file’s ability to appear in the results rankings.
1: Choose a descriptive file name
Example:
file-title-or-form-name-and-number.pdf
Similar to a title, a descriptive file name makes file content clear when a user downloads the file. File names are used in query matching and term frequency matching, and if the title tag is absent, the file name will be presented as the search result title. If a title is not set in the PDF file Properties, the file name will appear in the search results page instead of the title.
It’s best practice to use hyphens to separate words, rather than underscores or spaces.
2: Ensure text in the document is searchable
Search engines don’t provide readability for image-only PDFs. Many PDFs now are created digitally, and the text is embedded in the file. However, if a PDF is created through a scan, it often will be an image without embedded text, meaning the content cannot be used to help find the file in search.
Searchable text is used in query matching and term-frequency matching. Run all scanned PDFs through optical character recognition (OCR) software to convert an image to fully searchable text.
Using OCR
Learn how OCR can transform printed documents into digital files and how to make a PDF file searchable.
3: Add a title
Example:
Title: Unique title of the PDF file
Develop a unique, document-specific title of the PDF. This title is used by Search.gov similar to the HTML title tag and displayed in the list of search results. If the title field is left blank in the PDF properties, the file name will be displayed. You can add a title to your file by updating the file Properties in a program such as Adobe Acrobat. Titles are used in query matching and term frequency matching.
4: Add a description
Example:
Subject: A description of the PDF’s content. This is a great place to use synonyms and keywords, especially those in plain language. Aim for 160 characters or fewer.
Create a well-crafted, plain language summary of the particular unique file. This will often be used by search engines instead of a snippet of text from the PDF. Include all your relevant keywords you want the page to rank well for. Ideally, limit your description to 160 characters to prevent it being truncated on the search results page. This can be added through updating the file Properties in a program such as Adobe Acrobat. (Note that in Adobe>Properties, the description field is labeled "Subject.") Descriptions are used in query matching and term frequency matching.
5: Add keywords
Example:
Keywords: Relevant Keyword, Applicable Keyword, Pertinent Keyword, Related Keyword
List the terms the public would use to find this document. This can be added through updating the file Properties in a program such as Adobe Acrobat. Separate keywords using a comma. Both commas and semicolons are supported by Adobe, but Search.gov currently only supports commas. Keywords are used in query matching and term frequency matching.
6: Declare the file language
Example:
Language: English
When a file language isn’t set, the Search.gov system does its best to analyze the content and make a determination. Typically this is not a problem, but if the file wasn’t run through an OCR and all it finds is an image file name (or it is from an old scan where many letters were not correctly identified by the OCR), then the Search.gov system may decide the incorrect language. Search.gov advises setting the language to avoid any issues, which is often an optional setting when running files through the OCR. Language is used during Search.gov indexing.
7: Create HTML landing pages for your PDFs
If you are specifically looking to direct traffic to PDFs, you may consider creating an HTML landing page that is SEO-optimized using traditional semantic metadata. You could also choose to index the landing page exclusively rather than index and update your PDFs with document metadata.
A note about date metadata
If you view the properties of a PDF, you will notice that date fields are not easily modified in the same way that the title, description (“subject”), and keywords are. The dates associated with PDFs include the Created and Modified dates. Created dates reflect the time the PDF was originally produced, and the Modified date reflects the last time changes were saved to a document. Dates can impact ranking; fresher content ranks higher than older content in the results.
Disclaimer: All references to specific brands, products, and/or companies are used only for illustrative purposes and do not imply endorsement by the U.S. federal government or any federal government agency.