TIFF images and image compression
by Kent Phelps
Background
TIFF stands for Tagged Image File Format, a specification
for storing computer raster images in such a fashion that the
information about how the image is constructed (and thus, how
to decode it) is stored in a series of TAGs, or small record structures,
within the image itself. Tags describe such things as image width,
height, compression method if any, the author or copyright, origin
of the image, etc. TIFF was originated by companies working together
such as Aldus, Hewlett-Packard, Silicon Graphics, Xerox, and so
forth.
These days people generally expect TIFF images
to be groupIV fax black and white documents. While that is a common
use, especially for document imaging systems, there is much more
involved than that.
Compression
It is important to separate the idea of the TIFF format for an
image in general from the compression scheme of the image. A TIFF
G4 image is one which is compressed using the CCITT group IV bi-level
compression technique, and wrapped in a TIFF image whose tags
explain this to a TIFF reader. The image could just as well be
stored without the TIFF format and tags, but the reader would
need to know ahead of time how wide and deep an image is,
and what colors are in the image before decompressing it. Many
"proprietary" image systems do just that. They use a standard
compression type, but a nonstandard header to describe it, so
normal TIFF readers (software) cannot show the image.
Typical compression types are CCITT (international
fax standard) Group 3, either 1 dimensional (1D) or two dimensional
(2D), group 4 (group IV), Huffman , RLE (Run Length Encoded),
LZW (Lemple-Ziv-Welch - owned by Unisys Corporation).
For details of exactly what a TIFF image should
be, get a copy of the TIFF specification from Aldus Developer's
Desk.
Imaging issues
For document imaging, noteworthy issues are the compression
scheme, dots per inch (resolution), photometric interpretation
(black/white), contrast, multi-page formats, and intelligent use
of tags.
- Compression scheme. These are almost always
GroupIV. This seems to be the best, and smallest image, for
black and white business documents. Group 3 is not bad and is
the format used by many fax machines, but there are several
flavors of group 3 and not all readers can read all group 3
images, even though the tags tell how to do it. This issue normally
arises in conjunction with use of tags for exporting an image
so that other systems can read the images. Some document systems
keep images in a internal format but allow export in a common
format like GroupIV, while others use group IV as a native format
- Dots per inch. The resolution used for business
documents generally is 200 dpi. Laser printers normally print
at 300 dpi. Printing a document scanned at 200 dpi on a 300dpi
printer does not improve the inherent resolution, yet often
the goal in an imaging system is to strike a good balance between
storage space and image quality. For some uses, 150 dpi would
be acceptable where the documents are not detailed, and for
other uses where there is a lot of fine print 300dpi is necessary.
Going from 200 dpi to 300 dpi does not increase the image size
by 50% (1.5 times) but by 2.25 times (1.5 x 1.5) since the image
expands in both dimensions. Some fax documents (standard fax)
are created as 200 x 100 resolution and are expanded vertically
during display. This can also save storage space if the documents
are suitable. Color images are big no matter what and increase
the complexity of the compression technique. JPEG (Joint Photographic
Experts Group) compression is effective, but it is a "lossy"
compression. With JPEG, either some detail is lost while colors
are preserved in reducing image size, or else the detail
is kept but the image is larger. If the image is stored initially
at a certain detail level, for the most part it cannot be improved
later.
- Photometric Interpretation. This refers to
how the image is displayed. Are the 1's black and the 0's white,
or the other way around? Generally, this is not something the
end user has to worry about unless the imaging system will be
exporting images to another product. In that case, they both
need to agree on black and white.
- Contrast. It is often possible while scanning
documents, especially poor ones and handwritten ones, to improve
the readability of the resulting image using scanner features.
Some scanners such as Fujitsu have electronic circuits which
dynamically adjust contrast as the documents are scanned, evening
out variations, and can sometimes even be used to emphasize
or de-emphasize certain parts of the image like shiny ball-point
pen markings to create a better image than the original.
- Multi-page formats. TIFF (along with GIF
and DCX formats) includes the ability to "package"
several images into one file. Usually in document imaging each
page is a separate image file, but when exporting images, especially
if sending a set of images over the internet or in e-mail, it
is sometimes helpful to be able to group all the images together
so they cannot get lost. Many readers do not understand how
to decompress any but the first image, however.
- Intelligent use of tags. The tags in a TIFF
image normally just store information for the software to read
to display the image, but it is possible to put other
things in tags, such as indexing information. This means that
it is never possible to lose the index to an image even if the
databases controlling it are lost. Timestamps and information
about where and how the image was created can also be permanently
placed in the image. This is not a crucial feature, but nice
to have sometimes.
copyright © 1995. WitsEnds Software. All rights
reserved.
See also
Return to Delphi developers
components main page >