|
|
|
Select your preferred language |
TextFormatter, Part 1.
Welcome to this, the second series article. Over the next three articles
we will explore text formatting and eventually create a windows forms control
capable of fully justified text output.
Before we examine the techniques of text formatting, it's necessary to
understand some basic typography and discover how GDI+ can be used to get
information about text and how it's used to place text on the output device.
When computers were first fitted with video screens for
output, characters were displayed on a fixed grid and it was impossible to
access individual pixels on screen. Therefore, the first text output
systems were mono-spaced. This is to say that all characters were the same
width. The printing industry however, had been using type for hundreds of years in
which all characters had their own specific widths. It's obvious when you
read a newspaper for example, that the character "M" is considerably wider than
the character "I" and in order to typeset a column of text so that the edges of
the text lined up neatly, the printer had to carefully pad out each line
with spaces of different sizes so that the white space between the words made
each line a specific width. As computer technology advanced, and
particularly through the efforts of Xerox and Apple, typesetting on the computer
became more and more realistic and today, WYSIWYG output is taken for granted.
A modern TrueType font file is a very complex beast. It
contains huge amounts of information about font as a whole, the characters
within the font and indeed, the individual glyphs that go to make up each
character. In addition, a font contains a vector based program for each of
the glyphs which is "executed" by what is effectively a virtual microprocessor
whenever a new font is created. It is for this reason that a font file is more
properly known as a font program.
A font is described in a terminology that has been passed
down and evolved since 1440 when Johann Gutenberg completed
work on the first movable type press. This may seem somewhat anachronistic when
taken in the context of modern computing but it all seems to fit together well.
Figure 1 illustrates a few of more important terms.

Figure 1. Font terminology.
Referring to Figure 1. The em-square is the rectangular
measurement of the overall font height and width. Generally, all glyphs in a
font will fit insidethis rectangle. Because a font might need to be displayed
at different sizes, the internal measurements of a font are declared in
font-units. This is an abstract measure based upon some arbitrary figure and may
differ from font to font. Font units are a numerical division of the height of
the em-square and one font may declare it's em-height to be 2048, that is to say
that the basic unit of measure is 1:2048th of the em-height, another may declare
it's em-height as 1440. In all cases, font measurements are scaled appropriately
to arrive at the final point-size of the font. Incidentally "em" comes from "M"
which is the traditionally if not actually the widest letter in the font. Linear
measurement along a line is sometimes expressed as "ems" or M's, the number of
"M" characters across it is.
A font has a baseline upon which the font characters sit. The
distance above the baseline to which the long uprights of the letters "h" or "l"
reach is called the ascent height. The line below, to which the lower portions
of the "y" or "g" extend is called the descent.
The distance between the lines is called "leading" and is a
reference to the use of molten lead which was used to fix characters into place
before printing.
A glyph is a part of a character. Individual glyphs such as
the upper and lower marks of an exclamation point "!" are easier to spot than
the separate portions of a lower case "r" which is also often built from two
glyphs.
A Serif is a curly flourish on the end of strokes. Serif
fonts such as Georgia and
Times New Roman are considered as classic
fonts while Sans-Serif fonts like Verdana and
Tahoma are looked upon as modern.
A monospace font is one in which all the characters are of
the same width. Programmers are very familiar with
Courier New because most code editors use courier. Monospaced fonts are
easy to lay out because the characters fall naturally into columns. Proportional
fonts have a different width for each character.
Finally, the font styles such as regular, bold, underline and
strikeout may be available.
Need the info.
Although there is a wealth of information in a font program,
the classes and methods provided by .NET are sadly lacking in access to it. The
information that is available however is useful and what's missing can often be
pieced together using other methods.
The FontFamily class encapsulates the font
program and provides four methods; GetEmHeight,
GetCellAscent, GetCellDescent and GetLineSpacing.
These methods return the number of font units in the em-square, the ascent,
descent and leading of the font. Using this information for example, we can
position text on a given baseline. This is important because GDI+ positions each
character in the font according to the top-left of the em-square. Therefore,
fonts with a different ascent height won't sit on the same baseline. An example
of this is seen in figure 2 where some fonts are shown side-by-side.

Figure 2. What baseline?
The fonts shown in Figure 2 were output using the same Y
coordinate by the Graphics.DrawString method. Clearly, we must
take notice of cell ascents if we want to display fonts on a clearly defined
baseline. These fonts also have the same nominal font height yet Verdana is
clearly higher on average than Courier. Therefore, to space the lines correctly
in the vertical direction, we must take notice of the leading.
What's the point?
The traditional unit of measure used by printers for
font-height is the Point. A Point is roughly equivalent to 1:72nd of an inch so
a 72 point font will be an inch high, a 36 point font is 1/2 an inch and so-on.
The em-units of a font are arbitrary and so a certain amount of math is
necessary to convert pixels to points or the declared ascent height to points or
inches. To place fonts on a desired baseline the font must be positioned at
baseline_Y-ascent. The ascent must be converted from font design units to the
unit of measure desired. This can be accomplished by the following formula;
PointSize / em-height * ascent * resolution
Resolution in this case is calculated by converting the
vertical screen resolution to points, the basic unit of measure of the font,
using Graphics.DpiY/72. This renders the number of device pixels
per point. Using this formula with the DrawString method the text
now sits on the common baseline as seen in figure 3.
Figure 3. Fonts oriented to a common baseline.
GDI+ Rendering.
After the five minute typography groundwork it's time to begin
looking at the process of rendering type in a GDI+ context. The GDI+ rendering
engine tries to be resolution independent but, of necessity, the pixels have to
be rendered to a device with finite resolution. How pixel positions are
calculated affect the output greatly. Figure 4 shows an example of a line of
narrow characters in a small font.

Figure 4. Arial "l" (lower case "L") 8 point
What is shown in figure 4 is that the character advance width
of the lower case "L" is such that it creates what amounts to a beat frequency
with the pixels of the screen. As a result, positioning of the characters change
and the apparent width between them also changes. This makes it exceedingly
difficult to accurately place text. Two identical strings will often not even
render on top of one another reliably in some circumstances.
GDI+ provides two methods of improving this placement
reliability. The first is the StringFormat class that provides
ways of more precisely specifying how text should be treated. The second is the
Graphics.TextRenderingHint that enables you to use the power of
the font to reliably place characters in exactly the right place.
StringFormat has a static property,
GenericTypographic that can be used as-is or as the basis for custom
settings. The StringFormat class contains a FormatFlags
property which is used to more closely specify the formatting of strings when
output to the screen or printer. If the StringFormat.GenericTypographic
needs to be modified in any way then clone it before modifying it otherwise the
changes you make are retained and all subsequent uses of
GenericTypographic are affected.
Graphics.TextRenderingHint enables you to use
position hinting that's built in to true type fonts to more accurately place
characters. Small fonts on low resolution devices are difficult to display
because the desired width or height of a character rarely coincides with the
pitch of the pixels so adjustments are made that shrink, expand or reposition
the dots in the character to the nearest pixel. This is what causes the
disproportionate spacing seen in figure 4. Grid-fitting enables the font
renderer to more accurately predict the positions and sizes of adjacent
characters so that they are more consistent and using full resolution
independence or antialiasing provides the best and most consistent positioning.
It does however have the disadvantage that small fonts can become fuzzy and
indistinct. Figure 5 shows the effects of using grid-fitting and antialiasing.

Figure 5. Using TextRenderingHints
As you can see from figure 5, the predictability of the
character output position is only consistent and reliable using full antialias
mode.
Measure up.
With that basic grounding in GDI+ typography out of the way
it's time to look at the actual methods we'll need to use to lay out text. The
most important thing in this process is first measuring the text to see how big
it all is. This can be accomplished with Graphics.MeasureString.
An important point to note is that when measuring text with
GDI+ you must use exactly the same graphics settings in the StringFormat
and TextRenderingHint that you intend to use during drawing.
Failure to do so will mean that sizes measured and sizes output will differ and
so repeatability will fail.
Measuring a string is pretty simple. Just set up the Graphics
object the way you want it, decide which font you'll use, pick a string format
and measure the text. The information is returned as a SizeF which is a size
based upon floating point values, not integers. The image in figure 6 is
produced by the code in listing 1.

Figure 6. Obtaining the size of a string.
Listing 1.
SizeF sf=e.Graphics.MeasureString("The
quick brown fox jumps over the lazy dog", Font, new
SizeF(300,200), StringFormat.GenericTypographic);
e.Graphics.DrawString("The
quick brown fox jumps over the lazy dog", Font, Brushes.Black, 10, 20,
StringFormat.GenericTypographic);
e.Graphics.DrawString(sf.ToString(),
Font, Brushes.Black, 10, 50, StringFormat.GenericTypographic);
e.Graphics.DrawRectangle(Pens.Red,10,20,sf.Width,sf.Height);
Dim sf as SizeF =e.Graphics.MeasureString("The
quick brown fox jumps over the lazy dog", Font, new
SizeF(300,200), StringFormat.GenericTypographic)
e.Graphics.DrawString("The
quick brown fox jumps over the lazy dog", Font, Brushes.Black, 10, 20,
StringFormat.GenericTypographic)
e.Graphics.DrawString(sf.ToString(),
Font, Brushes.Black, 10, 50, StringFormat.GenericTypographic)
e.Graphics.DrawRectangle(Pens.Red,10,20,sf.Width,sf.Height);
It's also possible to specify a smaller
rectangle for the measured and output text and have the graphics object format
the text for you and tell you how many lines it spanned. Figure 7 shows the
effect and listing 2 shows the modified code.

Figure 7. Lines filled=2
Listing 2.
int charsfitted, linesfilled;
SizeF sf=e.Graphics.MeasureString("The
quick brown fox jumps over the lazy dog", Font, new
SizeF(150,300), StringFormat.GenericTypographic, out
charsfitted, out linesfilled);
e.Graphics.DrawString("The
quick brown fox jumps over the lazy dog", Font,Brushes.Black,
new RectangleF(10, 20, 150, 300), StringFormat.GenericTypographic);
e.Graphics.DrawString(string.Format("{0}, charsFitted={1}, linesFilled={2}", sf.ToString(),
charsfitted, linesfilled), Font, Brushes.Black, 10, 50,
StringFormat.GenericTypographic);
e.Graphics.DrawRectangle(Pens.Red,10,20,sf.Width,sf.Height);
Dim charsfitted
as Integer
Dim linesfilled
as Integer
Dim
sf as SizeF =e.Graphics.MeasureString("The
quick brown fox jumps over the lazy dog", Font, new
SizeF(150,300), StringFormat.GenericTypographic, charsfitted, linesfilled)
e.Graphics.DrawString("The
quick brown fox jumps over the lazy dog", Font,Brushes.Black,
new RectangleF(10, 20, 150, 300), StringFormat.GenericTypographic)
e.Graphics.DrawString(string.Format("{0}, charsFitted={1}, linesFilled={2}", sf.ToString(),
charsfitted, linesfilled), Font, Brushes.Black, 10, 50,
StringFormat.GenericTypographic)
e.Graphics.DrawRectangle(Pens.Red,10,20,sf.Width,sf.Height)
You might say
that this all looks fairly promising. GDI+ enables us to measure text and split
unruly sentences into neatly arranged paragraphs. This of course is true if you
want your text to look all ragged down the right hand side. If however you need
a neatly formed column of text, or even two side-by-side, forget it. That's as
good as System.Drawing does without a bit of help.
Breaking up is not hard to do.
Let's go back again for a moment
to the early days of printing and hot metal when a column of text was formed by
hand and spaced using as much and white space as was needed to make the edges of
the column lineup. In those days, the typesetter had to deal with each
word and each line on an individual basis. In effect, this is what
we have to do to persuade GDI+ to format the text in neat columns.
The first task in gaining
complete control over text layout and the last task for this article is
to break up the lines of text into individual words and measure them to find out
how much area each one requires. For the moment we can ignore some of the other
considerations such as formatting using tabs or other whitespace. Breaking the
sentence into words can be accomplished using String.Split and
then each word can be measured for area. Listing 3
shows a simple application that does this. Figure 8 shows the application in
action.

Summary
This part of the series has
shown you some of the fundamentals of typography and explained some of the
hurdles that have to be overcome in formatting text. You've seen how GDI+
measures and places strings and had a glimpse of the basic principles upon which
the final formatter control will be built. Later, we'll examine the business
of extracting enough information from the raw materials, the input strings, to
begin laying out text in more conventional ways.
You can find the Source Code files here.
Read on....
Return to the main index.