HAMEX Handwritten and Audio Dataset of Mathematical Expressions

Information
Categories | Handwriting Recognition |
---|---|
Created the | 16 Feb 2015 |
Information
Hamex is a new public bimodal database of mathematical expressions. The establishment of this database is part of the DEPART project (Documents Ecrits et PAroles – Reconnaissance et Traduction). This project, founded by the Pays de la Loire French region, aims to set up a platform of tools based on two natural communication ways widely used by humans which are handwriting and speech. This platform will be used to make easier the automatic processing of languages.
The database concerned contains 4350 different expressions with different levels of complexity
Examples of mathematical expressions from the HAMEX database
Each mathematical expression is available in its online handwritten form and audio one :
Handwritten and audio mathematical expression
The vocabulary (mathematical symbols) which is considered include 74 different symbols. The symbol ’-’ refers either to ’minus sign’ or to ’fraction bar’.
Digits | 0 ... 9 |
Latine characters | a ... z W X Y |
Greek characters | \alpha \beta \gamma \phi \pi \theta |
Operators | + - \pm x / \div |
Elastic symbols | \sum \int \sqrt \frac |
Set operators | \in \forall \exist |
Functions | \log \sin \cos |
Braces | ( ) |
Others | . , \rightarrow \infty |
Equality operators | = > < \neq \geq \leq |
58 different writers have participated to collect the handwritten mathematical expressions dataset. Similarly, 58 French speakers have uttered the corresponding speech dataset.
In addition to the raw data (ink for the handwriting modality and audio signals for the speech modality), the ground truth of each expression with respect to each modality is also available. This ground truth is performed in both symbols and inter symbols relationships levels (fig.3) using XML formats. The INKML format is used for the ink ground-truth and Transcriber’s format (*.trs) is used for the speech ground-truth.
Example of annotation of an expression with respect to each modality
The set of mathematical expressions collected is decomposed into a validation set and an evaluation set.
Base collected | Number of expressions | Number of hours | Number of writers/ speakers |
---|---|---|---|
Training | 2175 | 6h | 29 |
Evaluation | 2175 | 6h | 29 |
The main goal of the HAMEX database, is to be able to exploit the existing complementarity between the two modalities (handwriting and speech) to reach higher recognition rates and be able to set up more accurate systems. First attempt to perform that thanks to this database is done at a symbol level. As expected, this showed the reliability of a such procedure (table.3). Refer to [3] to know more about this experiment.
recognition process | recognition rate |
---|---|
Speech alone | 50.09% |
Handwriting alone | 81.55% |
Fusion of the two modalities | 98.04% |
INKML Data File Format
The digital ink corresponding to each handwritten document is saved in a INKML file. An INKML file mainly contains three kinds of information :
- the ink : a set of traces made of points ;
- the symbol level ground truth : the segmentation and label information of each symbol of the document ;
- the document ground truth : the XML structure of the document (e.g. MATHML structure for mathematical expressions)
Furthermore, some general information is added in the file :
- the channels (X and Y, optionally other likes P or T) ;
- the writer information (identification, handedness, age,gender, etc.), if available ;
- the LATEX ground truth (without any reference to the ink, to easily render it).
The INKML format enables to make references between the digital ink of the expression, its segmentation into symbols and its XML representation. Listing below shows an example of an INKML file for the mathematical expression described in MATHML a < b/c, containing 5 symbols for a total number of 6 strokes (two for the ’a’, and one for the other symbols). It can be seen that the traceGroup with identifier xml:id="8" has references to the 2 corresponding strokes of symbol ’a’, as well as to the MATHML part with identifier xml:id="A". Thus, the stroke segmentation of a symbol can be linked to its MATHML representation.
Some files samples are available to download here...
Example of an INKML file for the expression a < b/c
<ink xmlns="http://www.w3.org/2003/InkML">
<traceFormat><channel name="X" type="decimal"/>
<channel name="Y" type="decimal"/>
</traceFormat>
<annotation type="writer">w123</annotation>
<annotation type="truth">$a<\fracbc$</annotation>
<annotationXML type="truth" encoding="Content-MathML"><math xmlns="http://www.w3.org/1998/Math/MathML">
<mrow>
<mi xml:id="A">a</mi>
<mrow>
<mo xml:id="B"><</mo>
<mfrac xml:id="C">
<mi xml:id="D">b</mi><mi xml:id="E">c</mi>
</mfrac>
</mrow>
</mrow>
</math>
</annotationXML>
<trace id="1">985 3317, ..., 1019 3340</trace>...
<trace id="6">1123 3308, ..., 1127 3365</trace>
<traceGroup xml:id="7">
<annotation type="truth">Ground truth</annotation>
<traceGroup xml:id="8">
<annotation type="truth">a</annotation><annotationXML href="A"/>
<traceView traceDataRef="1"/>
<traceView traceDataRef="2"/>
</traceGroup>
...
</traceGroup>
</ink>