ASCIIMathML.js (ver 1.4.7): Syntax and List of Constants

The main aims of the ASCIIMathML syntax are: -- 1. close to standard mathematical notation -- 2. easy to read -- 3. easy to type

You can use your favorite editor to write HTML pages that use this JavaScript program. If the page is viewed by a browser that does not support MathML or JavaScript, the ASCII formulas are still quite readable. Most users will not have to read the technicalities on this page. If you type

`x^2` or `a_(mn)` or `a_{mn}` or `(x+1)/y` or `sqrtx`

you pretty much get what you expect: `x^2` or `a_(mn)` or `a_{mn}` or `(x+1)/y` or `sqrtx`. The choice of grouping parenthesis is up to you (they don't have to match either). If the displayed expression can be parsed uniquely without them, they are omitted. Printing the table of constant symbols (below) may be helpful (but is not necessary if you know the LaTeX equivalents).

It is hoped that this simple input format for MathML will further encourage its use on the web. The remainder of this page gives a fairly detailed specification of the ASCII syntax. The expressions described here correspond to a wellspecified subset of Presentation MathML and behave in a predictable way.

The syntax is very permissive and does not generate syntax errors. This allows mathematically incorrect expressions to be displayed, which is important for teaching purposes. It also causes less frustration when previewing formulas.

The parser uses no operator precedence and only respects the grouping brackets, subscripts, superscript, fractions and (square) roots. This is done for reasons of efficiency and generality. The resulting MathML code can quite easily be processed further to ensure additional syntactic requirements of any particular application.

The grammar: Here is a definition of the grammar used to parse ASCIIMathML expressions. In the Backus-Naur form given below, the letter on the left of the ::= represents a category of symbols that could be one of the possible sequences of symbols listed on the right. The vertical bar | separates the alternatives.

c ::= [A-z] | numbers | greek letters | other constant symbols (see below)
u ::= 'sqrt' | 'text' | 'bb' |     other unary symbols for font commands
b ::= 'frac' | 'root' | 'stackrel' binary symbols
l ::= ( | [ | { | (: | {:          left brackets
r ::= ) | ] | } | :) | :}          right brackets
S ::= c | lEr | uS | bSS | "any"   simple expression
E ::= SE | S/S |S_S | S^S | S_S^S  expression (fraction, sub-, super-, subsuperscript)

The translation rules: Each terminal symbol is translated into a corresponding MathML node. The constants are mostly converted to their respective Unicode symbols. The other expressions are converted as follows:

l`S`r`to`<mrow>l`S`r</mrow> (note that any pair of brackets can be used to delimit subexpressions, they don't have to match)
sqrt `S``to`<msqrt>`S'`</msqrt>
text `S``to`<mtext>`S'`</mtext>
"any"`to`<mtext>any</mtext>
frac `S_1` `S_2``to`<mfrac>`S_1'` `S_2'`</mfrac>
root `S_1` `S_2``to`<mroot>`S_2'` `S_1'`</mroot>
stackrel `S_1` `S_2``to`<mover>`S_2'` `S_1'`</mover>
`S_1`/`S_2``to`<mfrac>`S_1'` `S_2'`</mfrac>
`S_1`_`S_2``to`<msub>`S_1` `S_2'`</msub>
`S_1`^`S_2``to`<msup>`S_1` `S_2'`</msup>
`S_1`_`S_2`^`S_3``to` <msubsup>`S_1` `S_2'` `S_3'`</msubsup> or <munderover>`S_1` `S_2'` `S_3'`</munderover> (in some cases)
In the rules above, the expression `S'` is the same as `S`, except that if `S` has an outer level of brackets, then `S'` is the expression inside these brackets.

Matrices: A simple syntax for matrices is also recognized:
l(`S_(11)`,...,`S_(1n)`),(...),(`S_(m1)`,...,`S_(mn)`)r     or     l[`S_(11)`,...,`S_(1n)`],[...],[`S_(m1)`,...,`S_(mn)`]r.
Here l and r stand for any of the left and right brackets (just like in the grammar they do not have to match). Both of these expressions are translated to
<mrow>l<mtable><mtr><mtd>`S_(11)`</mtd>... <mtd>`S_(1n)`</mtd></mtr>... <mtr><mtd>`S_(m1)`</mtd>... <mtd>`S_(mn)`</mtd></mtr></mtable>r</mrow>.
For example {(S_(11),...,S_(1n)),(vdots,ddots,vdots),(S_(m1),...,S_(mn))] displays as `{(S_(11),...,S_(1n)),(vdots,ddots,vdots),(S_(m1),...,S_(mn))]`.
Note that each row must have the same number of expressions, and there should be at least two rows.

Tokenization: The input formula is broken into tokens using a "longest matching initial substring search". Suppose the input formula has been processed from left to right up to a fixed position. The longest string from the list of constants (given below) that matches the initial part of the remainder of the formula is the next token. If there is no matching string, then the first character of the remainder is the next token. The symbol table at the top of the ASCIIMathML.js script specifies whether a symbol is a math operator (surrounded by a <mo> tag) or a math identifier (surrounded by a <mi> tag). For single character tokens, letters are treated as math identifiers, and non-alphanumeric characters are treated as math operators. For digits, see "Numbers" below.

Spaces are significant when they separate characters and thus prevent a certain string of characters from matching one of the constants. Multiple spaces and end-of-line characters are equivalent to a single space.

Now for a complete list of constants (standard LaTeX names also work):

Numbers: A string of digits, optionally preceded by a minus sign, and optionally followed by a decimal point (a period) and another string of digits, is parsed as a single token and converted to a MathML number, i.e., enclosed with the <mn> tag. If it is not desirable to have a preceding minus sign be part of the number, a space should be inserted. Thus x-1 is converted to <mi>x</mi><mn>-1</mn>, whereas x - 1 is converted to <mi>x</mi><mo>-</mo><mn>1</mn>.

Greek letters: alpha `alpha` beta `beta` chi `chi` delta `delta` Delta `Delta` epsilon `epsilon` varepsilon `varepsilon` eta `eta` gamma `gamma` Gamma `Gamma` iota `iota` kappa `kappa` lambda `lambda` Lambda `Lambda` mu `mu` nu `nu` omega `omega` Omega `Omega` phi `phi` varphi `varphi` Phi `Phi` pi `pi` Pi `Pi` psi `psi` Psi `Psi` rho `rho` sigma `sigma` Sigma `Sigma` tau `tau` theta `theta` vartheta `vartheta` Theta `Theta` upsilon `upsilon` xi `xi` Xi `Xi` zeta `zeta`

Operation symbols
TypeSee
+`+`
-`-`
*`*`
**`**`
//`//`
\\`\\ `
xx`xx`
-:`-:`
@`@`
o+`o+`
ox`ox`
o.`o.`
sum`sum`
prod`prod`
^^`^^`
^^^`^^^`
vv`vv`
vvv`vvv`
nn`nn`
nnn`nnn`
uu`uu`
uuu`uuu`
Relation symbols
TypeSee
=`=`
!=`!=`
< `<`
>`>`
<=`<=`
>=`>=`
-<`-<`
>-`>-`
in`in`
!in`notin`
sub`sub`
sup`sup`
sube`sube`
supe`supe`
-=`-=`
~=`~=`
~~`~~`
prop`prop`
Logical symbols
TypeSee
and`and`
or`or`
not`not`
=>`=>`
if`if`
iff`iff`
AA`AA`
EE`EE`
_|_`_|_`
TT`TT`
|--`|--`
|==`|==`

Grouping brackets

TypeSee
(`(`
)`)`
[`[`
]`]`
{`{`
}`}`
(:`(:`
:)`:)`
{:`{:`
:}`{::}`
Miscellaneous symbols
TypeSee
int`int`
oint`oint`
del`del`
grad`grad`
+-`+-`
O/`O/`
oo`oo`
aleph`aleph`
/_`/_`
:.`:.`
|...||`...`|
|cdots||`cdots`|
vdots`vdots`
ddots`ddots`
|\ ||`\ `|
|quad||`quad`|
diamond`diamond`
square`square`
|__`|__`
__|`__|`
|~`|~`
~|`~|`
CC`CC`
NN`NN`
QQ`QQ`
RR`RR`
ZZ`ZZ`
Standard functions
TypeSee
sin`sin`
cos`cos`
tan`tan`
csc`csc`
sec`sec`
cot`cot`
sinh`sinh`
cosh`cosh`
tanh`tanh`
log`log`
ln`ln`
det`det`
dim`dim`
lim`lim`
mod`mod`
gcd`gcd`
lcm`lcm`
min`min`
max`max`

Accents

TypeSee
hat x`hat x`
bar x`bar x`
ul x`ul x`
vec x`vec x`
dot x`dot x`
ddot x`ddot x`
Arrows
TypeSee
uarr`uarr`
darr`darr`
rarr`rarr`
->`->`
|->`|->`
larr`larr`
harr`harr`
rArr`rArr`
lArr`lArr`
hArr`hArr`

Font commands

TypeSee
bb A`bb A`
bbb A`bbb A`
cc A`cc A`
tt A`tt A`
fr A`fr A`
sf A`sf A`

Of course you may want or need other symbols from the thousands of LaTeX symbols or unicode symbols. Fortunately ASCIIMathML.js is very easy to extend, so you can tailor it to your specific needs. (This could be compared to the LaTeX macro files that many users have developed over the years.)


Peter Jipsen, Chapman University, August 2005 Valid HTML 4.01!