ISO8859 Western Encoding

RFC 2234, 4234, 5234 ABNF for well formed 7bit encoded US-ASCII plaintext:-

SP = %x20
CR = %x0D
LF = %x0A
VCHAR = %x21-7E
CRLF =  CR LF
line = 1*VCHAR *(SP 1*VCHAR) CRLF
line = 1*80(SP / VCHAR) CRLF
paragraph = 1*line CRLF
plaintext = *paragraph 1*line

This can be placed inside HTML <pre>...</pre> if `&´, `<´ and `>´ are escaped. The line length limit for email messages is 78 character positions to ensure presentation on 80 column displays with scroll bars.

ABNF for free form US-ASCII listing text:-

SP = %x20
CR = %x0D
LF = %x0A
HTAB = %x09
VCHAR = %x21-7E
WSP = SP / HTAB
CRLF =  CR LF
looseline = 1*132(WSP / VCHAR) CRLF
looseparagraph = 1*looseline CRLF
loosetext = *looseparagraph 1*line

This can be placed inside HTML <listing>...</listing> if `&´, `<´ and `>´ are escaped. The limit is actually 132 character positions, not characters. Note that HTAB may move from 1 to 8 character positions. However, the line length limit for SMTP is 998 octets irrespective of presentation.

Changes to ABNF for 8bit encoded ISO8859-1 or 15 text:-

VCHAR = %x21-7E / %xA1-AC / %xAE-FF

Notes:-

&nbsp; (&#xa0;)
Space that is part of a word that is not broken. Converted to a real space after reflowing nonplaintext to plaintext. Applicable to HTML <p>...</p>.
&shy; (&#xad;)
Hyphen only visible when word is broken. Changed to hyphen and newline when breaking during conversion of nonplaintext to plaintext. Applicable to HTML <p>...</p>.

These are meaningless in plaintext which has fixed presentation. There are used in Microsoft encoded text where a single paragraph is encoded on one line often exceeding the SMTP 998 octet limit. They may be encoded in Microsoft's own byteswapped variant of UTF16 or use Microsoft's own Windows 1252 variant of ISO8859-1 which conflicts ECMA48 control characters.

Note that UTF8 conflicts with ECMA48 control characters which are not used in HTML, but may still be used in some plaintext data streams. To permit ECMA48 control character code points to be used then we would have:-

VCHAR = %x21-FF

US-ASCII RFC 20 Strict

Dec Hex  Dec Hex  Dec Hex  Dec Hex  Dec Hex  Dec Hex  Dec Hex  Dec Hex
  0 00 ␀   1 01 ␁   2 02 ␂   3 03 ␃   4 04 ␄   5 05 ␅   6 06 ␆   7 07 ␇
  8 08 ␈   9 09 ␉  10 0a ␊  11 0b ␋  12 0c ␌  13 0d ␍  14 0e ␎  15 0f ␏
 16 10 ␐  17 11 ␑  18 12 ␒  19 13 ␓  20 14 ␔  21 15 ␕  22 16 ␖  23 17 ␗
 24 18 ␘  25 19 ␙  26 1a ␚  27 1b ␛  28 1c ␜  29 1d ␝  30 1e ␞  31 1f ␟
 32 20 ␠  33 21 !  34 22 "  35 23 #  36 24 $  37 25 %  38 26 &  39 27 ´
 40 28 (  41 29 )  42 2a *  43 2b +  44 2c ,  45 2d -  46 2e .  47 2f /
 48 30 0  49 31 1  50 32 2  51 33 3  52 34 4  53 35 5  54 36 6  55 37 7
 56 38 8  57 39 9  58 3a :  59 3b ;  60 3c <  61 3d =  62 3e >  63 3f ?
 64 40 @  65 41 A  66 42 B  67 43 C  68 44 D  69 45 E  70 46 F  71 47 G
 72 48 H  73 49 I  74 4a J  75 4b K  76 4c L  77 4d M  78 4e N  79 4f O
 80 50 P  81 51 Q  82 52 R  83 53 S  84 54 T  85 55 U  86 56 V  87 57 W
 88 58 X  89 59 Y  90 5a Z  91 5b [  92 5c \  93 5d ]  94 5e ^  95 5f _
 96 60 `  97 61 a  98 62 b  99 63 c 100 64 d 101 65 e 102 66 f 103 67 g
104 68 h 105 69 i 106 6a j 107 6b k 108 6c l 109 6d m 110 6e n 111 6f o
112 70 p 113 71 q 114 72 r 115 73 s 116 74 t 117 75 u 118 76 v 119 77 w
120 78 x 121 79 y 122 7a z 123 7b { 124 7c | 125 7d } 126 7e ~ 127 7f ␡

"Smart" quotes:-
&#x22;H&#x22; gives "H", &#34;H&#34; gives "H",
&#x60;H&#x27; gives `H', &#96;H&#39; gives `H',
&#x60;H&#xb4; gives `H´, &#96;H&#180; gives `H´,
&#x2018;H&#x2019; gives ‘H’, &#8216;H&#8217; gives ‘H’,
&#x201c;H&#x201d; gives “H”, &#8220;H&#8221; gives “H”.

Groff -Tlatin1 output uses `H' for single quotes and ``H'' for double quotes,
but this output is only valid for -Tascii. Would be better if left as 'H' and
"H" for -Tlatin1.

0/9  `␉´ Horizontal Tabulation (HT)
0/10 `␊´ or `␤´ Line Feed (LF) or New Line (NL)
0/13 `␍´ Carriage Return (CR)
2/0  `␠´ Space (SP) normally Non-Printing
2/2  `"´ Quotation Marks (Diaeresis [2])
2/7  `´´ or `'´ Apostrophe (Closing Single Quotation Mark Acute Accent [2])
2/12 `,´ Comma (Cedilla [2])
2/13 `-´ Hyphen (Minus)
3/12 `<´ Less Than
3/14 `>´ Greater Than
5/14 `^´ Circumflex [2,3]
5/15 `_´ Underline
6/0  ``´ Grave Accent [2,3] (Opening Single Quotation Mark)
7/12 `|´ Vertical Line [3]
7/14 `~´ Overline [3] (Tilde [2]; General Accent [2])
7/15 `␡´ Delete (DEL) [1]

Note that the Apostrophe `'´ is vertical in HTML and does not balance ``´.
This is because HTML and UTF-8 are based on ISO8859-1 which introduced a
duplicate code for `´´. This was fixed later in ISO8859-15 which is a
strict US-ASCII superset.

US-ASCII defined: 1*(SOH heading STX text ETX) EOT
However, we now use: 1*(heading HT text LF)
The EOT is now only used on devices that don't have a logical End of File.

ISO8859-1, DECMCS, Latin1, Windows1252 or CP1252

Dec Hex  Dec Hex  Dec Hex  Dec Hex  Dec Hex  Dec Hex  Dec Hex  Dec Hex
160 a0   161 a1 ¡ 162 a2 ¢ 163 a3 £ 164 a4 ¤ 165 a5 ¥ 166 a6 ¦ 167 a7 § 
168 a8 ¨ 169 a9 © 170 aa ª 171 ab « 172 ac ¬ 173 ad - 174 ae ® 175 af ¯ 
176 b0 ° 177 b1 ± 178 b2 ² 179 b3 ³ 180 b4 ´ 181 b5 µ 182 b6 ¶ 183 b7 · 
184 b8 ¸ 185 b9 ¹ 186 ba º 187 bb » 188 bc ¼ 189 bd ½ 190 be ¾ 191 bf ¿ 
192 c0 À 193 c1 Á 194 c2 Â 195 c3 Ã 196 c4 Ä 197 c5 Å 198 c6 Æ 199 c7 Ç 
200 c8 È 201 c9 É 202 ca Ê 203 cb Ë 204 cc Ì 205 cd Í 206 ce Î 207 cf Ï 
208 d0 Ð 209 d1 Ñ 210 d2 Ò 211 d3 Ó 212 d4 Ô 213 d5 Õ 214 d6 Ö 215 d7 × 
216 d8 Ø 217 d9 Ù 218 da Ú 219 db Û 220 dc Ü 221 dd Ý 222 de Þ 223 df ß 
224 e0 à 225 e1 á 226 e2 â 227 e3 ã 228 e4 ä 229 e5 å 230 e6 æ 231 e7 ç 
232 e8 è 233 e9 é 234 ea ê 235 eb ë 236 ec ì 237 ed í 238 ee î 239 ef ï 
240 f0 ð 241 f1 ñ 242 f2 ò 243 f3 ó 244 f4 ô 245 f5 õ 246 f6 ö 247 f7 ÷ 
248 f8 ø 249 f9 ù 250 fa ú 251 fb û 252 fc ü 253 fd ý 254 fe þ 255 ff ÿ 

Works on VT220 actual terminals and terminal emulators.
&nbsp; Has non-breaking space which is meaningless in plaintext, duplicates Space.
&curren; Does not have Euro. This can be changed on later terminals.
&brvbar; Duplicates Vertical Line of US-ASCII.
Guillemets duplicate Quotation Marks, Less Than or Greater Than of US-ASCII.
&macr; Duplicates Overline (Macron) of US-ASCII.
&uml; Duplicates Diaeresis (Umlaut) of US-ASCII.
&shy; Has soft-hyphen which is meaningless in plaintext, duplicates Hyphen.
Has support for typing of SI units with `°±²³µ·´.
&acute; &sup1; Duplicates Acute Accent of US-ASCII.
&cedil; Duplicates Cedilla, Prime, Apostrophe of US-ASCII.
US only vulgar fractions.
Ordinals, biol., med., etc.
Spanish, etc.

Macron is not a minus sign but a combining character which requires overstrike
and suitable font. Which is easier to read and type `m/s´ or `m·s¯¹´?

Note that UTF is based on this character set and thus has the same
defect of multiple codes for the same charater. Microsoft Windows-1252
also has this defect. UTF also does not collate unless LC_COLLATE=C
which makes it unsuitable for information storage and retreival.
However, the character sets here collate based on code values and thus
collation is both deterministic and monotonic.

ISO8859-15 with Euro and additional languages

Dec Hex  Dec Hex  Dec Hex  Dec Hex  Dec Hex  Dec Hex  Dec Hex  Dec Hex
160 a0   161 a1 ¡ 162 a2 ¢ 163 a3 £ 164 a4 € 165 a5 ¥ 166 a6 Š 167 a7 § 
168 a8 š 169 a9 © 170 aa ª 171 ab « 172 ac ¬ 173 ad - 174 ae ® 175 af ¯ 
176 b0 ° 177 b1 ± 178 b2 ² 179 b3 ³ 180 b4 Ž 181 b5 µ 182 b6 ¶ 183 b7 · 
184 b8 ž 185 b9 ¹ 186 ba º 187 bb » 188 bc Œ 189 bd œ 190 be Ÿ 191 bf ¿ 
192 c0 À 193 c1 Á 194 c2 Â 195 c3 Ã 196 c4 Ä 197 c5 Å 198 c6 Æ 199 c7 Ç 
200 c8 È 201 c9 É 202 ca Ê 203 cb Ë 204 cc Ì 205 cd Í 206 ce Î 207 cf Ï 
208 d0 Ð 209 d1 Ñ 210 d2 Ò 211 d3 Ó 212 d4 Ô 213 d5 Õ 214 d6 Ö 215 d7 × 
216 d8 Ø 217 d9 Ù 218 da Ú 219 db Û 220 dc Ü 221 dd Ý 222 de Þ 223 df ß 
224 e0 à 225 e1 á 226 e2 â 227 e3 ã 228 e4 ä 229 e5 å 230 e6 æ 231 e7 ç 
232 e8 è 233 e9 é 234 ea ê 235 eb ë 236 ec ì 237 ed í 238 ee î 239 ef ï 
240 f0 ð 241 f1 ñ 242 f2 ò 243 f3 ó 244 f4 ô 245 f5 õ 246 f6 ö 247 f7 ÷ 
248 f8 ø 249 f9 ù 250 fa ú 251 fb û 252 fc ü 253 fd ý 254 fe þ 255 ff ÿ 

Adds Euro `€´. Maintains support for typing of SI units.
Adds language support and removes duplicate Vertical Line, Diaeresis, Acute
Accent and Cedilla of US-ASCII and also removes US only vulgar fractions.