INK_ME ISI Nantes KAIST Mathematical Expression database
|Created the||16 Feb 2015|
This database is a merging of 4 databases from 3 laboratories :
- Indian Statistical Institute (Kolkata, India) : Prof. Utpal Garain, Computer Vision and Pattern Recognition (CVPR) Unit
- IRCCyN (Nantes, France) : Dr. Harold Mouchère and Prof. Christian Viard-Gaudin, IVC team
- Korea Advanced Institute of Science and Technology (Daejeon, Korea), Mr. Dae Hwan Kim and Prof. Jin Hyung Kim, Division of Computer Science
From this merging, the database of the CROHME competition at ICDAR 2011 has been extracted. The selection of the CROHME parts is based on the common grammar. Please visit the CROHME website for more information about it.
All these databases have the same inkml format. This XML format allows segmentation and labeling of symbols at the stroke level. Furthermore the MathML ground-truth of mathematical expressions is embedded in each inkml file.
The next table shows the number of expressions of each dataset with a link to each specific description.
|DB||Lab||#Math. Expr.||#Symbol class||#ME in CROHME Train DB||#ME in CROHME Test DB|
When you use this dataset in a publication, please make a reference to this webpage or to one of these papers :
- KME1&KME2 : "Efficient Search Strategy in Structural Analysis for Handwritten Mathematical Expression Recognition", Taik Heon Rhee and Jin Hyung Kim, Pattern Recognition 42 (2009), Vol.42 No.12 pp.3192-3201, December 2009. "Top-Down Search with Bottom-up Evidence for Recognizing Handwritten Mathematical Expression," Dae Hwan Kim and Jin Hyung Kim, The 12th International Conference on Frontiers in Handwriting Recognition (ICFHR), Nov 2010, Kolkata, India. "Conditional Random Field Parsing for Recognizing Online Handwritten Mathematical Expressions," Dae Hwan Kim and Jin Hyung Kim, The 2nd China-Japan-Korea Joint Workshop on Pattern Recogition (CJKPR), Nov 2010, Hukuoka, Japan.
- ISI : EMERS : a tree matching-based performance evaluation of mathematical expression recognition systems, Kunal Sain, Abhishek Dasgupta, Utpal Garain, IJDAR 14(1) : 75-85 (2011)
- AWAL_EM : Towards Handwritten Mathematical Expressions Recognition,Awal A.-M., Mouchère H., Viard-Gaudin C., Proceedings of the 10th International Conference on Document Analysis and Recognition ICDAR 2009, Espagne (2009)
- HAMEX : HAMEX - a Handwritten and Audio Dataset of Mathematical Expressions, Quiniou S., Mouchère H., Peña Saldarriaga S., Viard-Gaudin C., Morin E., Petitrenaud S., Medjkoune S., Proceedings of the 11th International Conference on Document Analysis and Recognition, ICDAR 2011, Chine (2011)
- CROHME : CROHME2011 : Competition on Recognition of Online Handwritten Mathematical Expressions, Mouchère H., Viard-Gaudin C., Garain U., Kim D. H., Kim J. H. Proceedings of the 11th International Conference on Document Analysis and Recognition, ICDAR 2011, Chine (2011)
INKML Data File Format
The digital ink corresponding to each handwritten document is saved in a INKML file. An INKML file mainly contains three kinds of information :
- the ink : a set of traces made of points ;
- the symbol level ground truth : the segmentation and label information of each symbol of the document ;
- the document ground truth : the XML structure of the document (e.g. MATHML structure for mathematical expressions)
Furthermore, some general information is added in the file :
- the channels (X and Y, optionally other likes P or T) ;
- the writer information (identification, handedness, age,gender, etc.), if available ;
- the LATEX ground truth (without any reference to the ink, to easily render it).
The INKML format enables to make references between the digital ink of the expression, its segmentation into symbols and its XML representation. Listing below shows an example of an INKML file for the mathematical expression described in MATHML a < b/c, containing 5 symbols for a total number of 6 strokes (two for the ’a’, and one for the other symbols). It can be seen that the traceGroup with identifier xml:id="8" has references to the 2 corresponding strokes of symbol ’a’, as well as to the MATHML part with identifier xml:id="A". Thus, the stroke segmentation of a symbol can be linked to its MATHML representation.
Some files samples are available to download here...
Example of an INKML file for the expression a < b/c
<channel name="X" type="decimal"/>
<channel name="Y" type="decimal"/>
<annotationXML type="truth" encoding="Content-MathML">
<trace id="1">985 3317, ..., 1019 3340</trace>
<trace id="6">1123 3308, ..., 1127 3365</trace>
<annotation type="truth">Ground truth</annotation>