If you have ever wondered why one compressor shrinks a file by 80% and another barely touches it, the answer lies in what is actually inside the PDF. Here is what is really happening when you compress.
Text and vectors are already tiny
Selectable text and vector graphics (lines, shapes, logos) are stored as compact instructions, not pixels. A 50-page text report might be a few hundred kilobytes. There is very little to compress, which is why running a text-only PDF through a compressor often does almost nothing.
Images are where the weight is
Embedded images β scans and photos β are usually the bulk of a large PDF. Two properties drive their size:
- Resolution (DPI). A page scanned at 600 DPI has four times as many pixels as the same page at 300 DPI. If you only view it on screen, most of those pixels are invisible detail.
- Encoding quality. JPEG compression discards visual information the eye is least sensitive to. Lower quality, smaller file β up to a point where artefacts appear.
Downsampling: the biggest lever
Downsampling reduces an image's pixel dimensions to match how it is actually displayed. Taking a 600-DPI scan down to 150 DPI for screen use can cut its data by roughly 90% with no visible change at normal viewing size. This is almost always the single most effective step.
Why safe compressors skip some images
Not every image can be re-encoded safely. Images in CMYK colour, those with transparency masks, or palette-based (indexed) images can be corrupted by naΓ―ve re-encoding. A careful compressor β like PDFelly's β detects these and leaves them untouched, only re-encoding ordinary RGB and grayscale images, and only keeping the result when it is genuinely smaller. It then verifies the whole file still opens before giving it back.
Putting it to use
For practical, step-by-step instructions see compress a PDF without losing quality and reduce PDF size for email.
A quick mental model of a PDF's weight
Picture a PDF as text instructions plus a gallery of images. The instructions β characters, fonts, vector shapes β are tiny. The gallery is where the megabytes live. So the question "how do I make this smaller" almost always becomes "what can I safely do to the images", and the two answers are: use fewer pixels (downsample) and store each pixel more efficiently (re-encode).
Downsampling, concretely
A document page is only so many inches wide. If an embedded image holds far more pixels than can ever be displayed across that width, the surplus is invisible detail you are paying to store. Downsampling reduces the image to a sensible pixels-per-inch for its use β commonly 150 DPI for screen reading β which can remove the large majority of an image's data with no visible change at normal size.
Why careful tools skip certain images
Some images cannot be naively re-encoded without risk: CMYK colour intended for print, images with soft-mask transparency, or palette-based indexed colour. Re-encoding these blindly can shift colours or corrupt the image. A trustworthy compressor detects them and leaves them alone, touching only ordinary RGB and grayscale images, keeping a re-encoded result only when it is genuinely smaller, and verifying the whole file still parses before returning it.
Frequently asked questions
What is downsampling?
Reducing an image's pixel dimensions to match how it is actually displayed. It is usually the most effective way to shrink a PDF.
Why doesn't compressing text-only PDFs help?
Text and vectors are stored as compact instructions, not pixels, so they are already small and there is little to compress.
What DPI should I target?
Around 150 DPI is a good balance for on-screen reading; 300 DPI preserves print quality at a larger size.
Can compression damage my PDF?
A careful compressor avoids risky images and verifies the output. PDFelly reverts to your original if the result fails its integrity check.
Reading a PDF's size like an engineer
Once you understand that images dominate size, you can predict how compressible a file is before you even try. A text report exported from a word processor will be small and barely compressible, because it is almost all instructions. A slide deck with a few photos sits in the middle. A scanned document is essentially a stack of images and is highly compressible, especially if it was scanned at a high resolution it does not need. A brochure full of full-bleed photography may resist compression because its images are already near their useful limit. This mental model also explains the odd cases: a tiny-looking document that is surprisingly large usually hides one enormous embedded image, and finding and downsampling that single image fixes it. Approaching compression this way turns it from guesswork into diagnosis β identify where the weight is, apply the right lever, and confirm the result still looks right at normal viewing size.
Related guides
- How to Compress a PDF Without Losing Quality
- How to Flatten a PDF (and Why You Might Need To)
- How to Convert a PDF to Grayscale