1. Introduction
1.1. TEI Lex-0 in a nutshell
TEI Lex-0 is both a technical specification and a set of community-based recommendations for encoding machine-readable dictionaries. It is rooted in the Guidelines of the Text Encoding Initiative (TEI) and delivered as a customization of the TEI schema.
Following the spirit of TEI Analytics, developed in the context of the MONK project (Zillig 2009), TEI Lex-0 aims at establishing a baseline encoding and a target format to facilitate the interoperability of heterogeneously encoded lexical resources. This is important both in the context of building lexical infrastructures as such (Ermolaev and Tasovac 2012) and in the context of developing generic TEI-aware tools such as dictionary viewers and profilers.
For the latest changes, see our revision history.
1.2. The community
Preliminary work for the establishment of TEI Lex-0 started in the Working Group "Retrodigitised Dictionaries" lead by Toma Tasovac and Vera Hildenbrandt as part of the COST Action European Network of e-Lexicography (ENeL). Upon the completion of the COST Action, the work on TEI Lex-0 was taken up by the DARIAH Working Group "Lexical Resources". Currently, the work on TEI Lex-0 is also supported by the H2020-funded European Lexicographic Infrastructure (ELEXIS).
1.2.1. DARIAH Working Group
The DARIAH Working Group on Lexical Resources is a self-organized scholarly community working under the auspices of the pan-European Digital Research Infrastructure for Arts and Humanities (DARIAH-EU). The goals of the WG are:
- to explore, assess and recommend standard tools and methods for the creation, application and dissemination of born-digital and retro-digitized lexical resources (dictionaries, lexicons, thesauri, word lists etc.) as well as other, similar kinds of structured data (gazetteers, almanacs, encyclopaedias etc.); and
- to foster, develop and publicize digitally-enabled lexicographic research from a cross-disciplinary and transnational perspective.
The WG focuses on the application and explication of existing standards, both onomasiological (TMF, TBX and SKOS) and semasiological (LMF, TEI, and Ontolex); draws upon the expertise of various DARIAH partners who are active in this field; and collaborates with relevant external projects and associations, such as the European Lexicographic Infrastructure (ELEXIS) and CLARIN in order to ascertain the widest possible reach of the Working Group’s results.
At the same time, the WG pursues a strong research-driven agenda on the diversity of European lexicographic heritage. In addition to investigating pan-European vocabularies and multiple dimensions of lexical borrowing, the working group evaluates current practices and formulates guidelines on data enrichment and mutual linking of existing electronic dictionaries in view of their common European heritage.
WG Chairs
Laurent Romary is Directeur de Recherche at Inria (team ALMAnaCH (France)). He received a PhD degree in computational linguistics in 1989 and his Habilitation in 1999. He carries out research on the modelling of semi-structured documents, with a specific emphasis on texts and linguistic resources. He has been active in standardisation activities with ISO, as chair of committee ISO/TC 37/SC 4 (2002-2014), chair of ISO/TC 37 (2016-) and the Text Encoding Initiative, as member (2001-2011) and chair (2008-2011) of its Technical Council. He also has a long-standing implication in open science related activities.
Toma Tasovac is Director of the Belgrade Center for Digital Humanities (BCDH) and DARIAH-EU. He was educated at Harvard University, Princeton University and Trinity College Dublin. His areas of interest include lexicography, data modeling, TEI, digital editions and research infrastructures. He previously served as the National Coordinator of DARIAH-RS and Chair of the National Coordinators' Committee at DARIAH-EU. Under Toma's leadership, BCDH has received funding from various national and international granting bodies, including Erasmus Plus and Horizon 2020.
DigiLex Blog
The working group runs a blog called DigiLex: Legacy Dictionaries Reloaded as a platform for sharing tips, raising questions and discussing methods for the creation of lexical resources.
1.2.2. ELEXIS
ELEXIS is a H2020-funded project which proposes to integrate, extend and harmonise national and regional efforts in the field of lexicography, both modern and historical, with the goal of creating a sustainable infrastructure which will (1) enable efficient access to high-quality lexical data in the digital age, and (2) bridge the gap between more advanced and lesser-resourced scholarly communities working on lexicographic resources.
1.2.3. Contributors
- Piotr Banski
- Jack Bowers
- Jesse de Does
- Katrien Depuydt
- Tomaž Erjavec
- Alexander Geyken
- Axel Herold
- Vera Hildenbrandt
- Mohamed Khemakhem
- Boris Lehečka
- Snežana Petrović
- Laurent Romary
- Ana Salgado
- Toma Tasovac
- Andreas Witt
1.2.4. The Rahtz Prize
In recognition of their work on TEI Lex-0, the DARIAH WG Lexical Resources was awarded the 2020 Rahtz Prize for TEI Ingenuity.
Members of the DARIAH Working Group Lexical Resources have made a valuable contribution to the Dictionaries Chapter of the TEI Guidelines. Their efforts and their expertise have been formidable and highly appreciated by the TEI Community for many years. — Martina Scholger, Chair of the TEI Technical Council
1.2.5. Meetings
The Working Group has organized a number of working meetings dedicated to the development of TEI Lex-0. These include:
- Toward Best Practice Guidelines for Encoding Legacy Dictionaries: An ENeL-DARIAH-PARTHENOS Expert Workshop. Preußische Staatsbibliothek, Berlin (17-19 November 2016).
- Overview of Retrodigitized Dictionaries and Best-Practice Guidelines For Encoding Legacy Dictionaries. ENeL Annual Meeting, Budapest (24 February 2017).
- TEI Lex-0 @DARIAH WG "Lexical Resources". Harnack Haus, Freie Universität Berlin (27 April 2017).
- TEI Lex-0 @DARIAH WG "Lexical Resources". Austrian Center for Digital Humanities, Austrian Academy of Sciences, Vienna (26 June 2017).
- TEI Lex-0: From Best-Practice Guidelines to a TEI Schema. DARIAH-EU Coordination Office, Berlin (2-3 May 2018). Funded by DARIAH-EU's Working Groups Funding Scheme and ELEXIS.
- TEI Lex-0 and Beyond: A Workshop. University of Ljubljana (16 July 2018). Funded by DARIAH-EU's Working Group Funding Scheme and ELEXIS.
- TEI Lex-0 Meeting. DARIAH-EU Coordination Office, Berlin (30 January 2019).
- Joint TEI Lex-0 / Ontolex-Lemon Meeting. Collocated with eLex 2019. Sintra, Portugal (4 October 2019). Funded by ELEXIS.
- Toward a TEI Lex-0 Publisher: A Workshop, DARIAH-EU Coordination Office, Berlin (16-17 December 2019). Funded by the Belgrade Center for Digital Humanities.
1.2.6. Training measures
TEI Lex-0 and best practices in lexical data modeling have been introduced to large number of young scholars at various training events, including:
- Lexical Data Masterclass 2017. Co-organized by DARIAH, the Berlin Brandenburg Academy of Sciences (BBAW), Inria and the Belgrade Center for Digital Humanities, with the support of the German Ministry of Education and Research (BMBF), CLARIN and DARIAH-DE. For an overview, check out this blog post.
- Lexical Data Masterclass 2018. Co-organized by DARIAH, the Berlin Brandenburg Academy of Sciences (BBAW), Inria and the Belgrade Center for Digital Humanities, with the support of the German Ministry of Education and Research (BMBF), French Ministry for Higher Education, Research and Innovation (MESRI), ELEXIS, CLARIN and DARIAH-DE. For an overview, check out From Àbèsàbèsì to XPath on DigiLex.
- From Print to Screen: The Theory and Practice of Digitizing Dictionaries. Lisbon Summer School in Linguistics (2-6 July 2018).
- Encoding Dictionaries with TEI: A Masterclass. Lisbon Summer School in Linguistics (1-5 July 2019).
- DH Training Workshop: Digital Methods for Linguistic Investigation (13-15 November 2019). Organized by the Seminar für Semitistik und Arabistik, Freie Universität Berlin, with the support of the Alexander von Humboldt Foundation and Syncro Soft.
The European Digital Humanities Masterclass 2020 had to be postponed due to the Corona pandemic.
A picture is worth a thousand words
1.3. The rationale
To what extent can we achieve consistent encoding within a given community of practice by following the TEI Guidelines? The topic is of particular importance for lexical data if we think of the potential wealth of content we could gain from pooling together the information available in the variety of highly structured, historical and contemporary lexical resources. The encoding possibilities offered by the Dictionaries Chapter in the Guidelines are too numerous and too flexible to guarantee sufficient interoperability and a coherent model for searching, visualising or enriching multiple lexical resources.
TEI Lex-0 should not be thought of as a replacement of the Dictionaries Chapter in the TEI Guidelines or as the format that must be necessarily used for editing or managing individual resources, especially in those projects and/or institutions that already have established workflows based on their own flavors of TEI. TEI Lex-0 should be primarily seen as a format that existing TEI dictionaries can be unequivocally transformed to in order to be queried, visualised, or mined in a uniform way. At the same time, however, there is no reason why TEI Lex-0 could not or should not be used as a best-practice example in educational settings or as a foundation of new TEI-based projects. This is especially true considering the fact that TEI Lex-0 aims to to stay as aligned as possible with the TEI subset developed in conjunction with the revision of the ISO LMF (Lexical Markup Framework) standard (cf. Romary 2015)
1.4. The guidelines
1.4.1. How to cite these guidelines
Full citationToma Tasovac, Laurent Romary, Piotr Banski, Jack Bowers, Jesse de Does, Katrien Depuydt, Tomaž Erjavec, Alexander Geyken, Axel Herold, Vera Hildenbrandt, Mohamed Khemakhem, Boris Lehečka, Snežana Petrović, Ana Salgado and Andreas Witt. 2018. TEI Lex-0: A baseline encoding for lexicographic data. Version 0.9.3. DARIAH Working Group on Lexical Resources. https://dariah-eric.github.io/lexicalresources/pages/TEILex0/TEILex0.html.
Short citationToma Tasovac, Laurent Romary et al. 2018. TEI Lex-0: A baseline encoding for lexicographic data. Version 0.9.3. DARIAH Working Group on Lexical Resources. https://dariah-eric.github.io/lexicalresources/pages/TEILex0/TEILex0.html.
1.4.2. Revision history
Changes to the TEI Lex-0 specification up to version 0.8.6 were included in comments inside the ODD file itself. Starting with version 0.9.0, we're listing a summary of the changes in this list for easier reference.
- <catDesc> must contain a <term>
- switch to using the external TEI add-on in oXygen when generating schema and documentation
- <usg> types between the specification and documentation (use
temporal
instead oftime
fix the mismatch in - <listBibl> in <sourceDesc> with three suggested type values:
dictionaries
,corpora
andliterature
require
- switch to using oXygen's TEI framework when generating schema and documentation
- <list> and <item> because lists feature prominently in dictionary front matter allow
- model.lexicalInter (based on model.inter), model.lexicalPhrase (based on model.phrase) and macro.lexicalParaContent (based on macro.paraContent) to make it easier to simplify the content model of various dictionary elements introduce
- model.listLike from model.lexicalInter remove
- link version number in the menu to revision history
- <abbr> and <expan> so that they can be used in lists of abbreviations in dictionary front matter allow
valency
as a suggested value ingram[@type="valency"]
introduced gram[@type="government"]
and clarified the difference fromgram[@type="colloc"]
. See sections on Typology ofgram
and Collocates introduced @type
mandatory on <TEI> made - <principal> and <affiliation> for more robust metadata in the <teiHeader> add
- fix namespace issues in html output
- Header section add new examples to the
- hierarchichal usage labels add section on
- <taxonomy>, <category> and <catDesc> in <classDecl> allow
- specification to a different webpage for quicker loading move the
- TEI Header add section on
- correction of various misspellings
- <monogr> (needed for <biblStruct>) add
- <forename> and <surname> for more fine-grained bibliographic information add
- <editorialDecl> add
- <email> to make possible contact information in the header add
- <availability> in <publicationStmt> to provide <licence> require
- <sourceDesc> optional make
- <biblStruct> in <sourceDesc> allow only
- model.publicationStmtPart.agency unbound to allow both <publisher> and <authority> in <publicationStmt> make
- role to <authority> with suggested values: funder, sponsor, rightsHolder add
- <language>, <langUsage> and <profileDesc> require
- role to <language> with a closed list of values: objectLanguage, workingLanguage, sourceLanguage, targetLanguage add
2. Header
2.1. General remarks
A lexical resource encoded in TEI Lex-0 must, like any TEI file, start with the root <TEI> element, which, in turn, must contain a <teiHeader> element.
TEI Lex-0, unlike TEI P5, however, requires the @type
attribute on the root TEI
with the value "lex-0".
A TEI header contains information about the lexical resource itself, its source(s), its encoding, and its revisions. Proper, structured metadata of this kind is equally important for scholars using the resource, for software processing them, and for cataloguers in libraries and archives.
The TEI header of a lexical resource has five major parts:
- a file description, tagged <fileDesc>, provides a full bibliographic description of the electronic lexical resource itself as well as the source(s), analogue or digital, from which it may have been derived. For details, see section File Description below.
- an encoding description, tagged <encodingDesc>, describes the relationship between the electronic resource and its source(s). It allows for detailed description of whether (or how) the electronic resource was produced, transcribed or normalized, how the encoder resolved ambiguities in the source, what levels of encoding or analysis were applied etc.
- a profile description, tagged <profileDesc>, contains classificatory and contextual information about the lexical resource including its object and working languages.
- a container for external metadata, tagged <xenoData>, contains metadata from non-TEI schemas, for instance Dublin Core, MARCXML or MODS, if available.
- a revision history, tagged <revisionDesc>, contains a list of changes made during the development of the lexical resource, both before and after its official release.
Of these, two elements are required in TEI Lex-0: <fileDesc> and <profileDesc>. It is highly recommended to include additional information in <encodingDesc>. It is also an example of good practice to record changes in <revisionDesc>.
2.2. File description
The bibliographic description of the given machine-readable lexical resource is absolutely essential for identifying the basic information about the resource itself, its creators and publishers as well as the conditions under which it is made available to the public.
The elements that make up <fileDesc> are:
- titleStmt (title statement) groups information about the title of a work and those responsible for its content.
- editionStmt (edition statement) groups information relating to one edition of a text.
- extent (extent) describes the approximate size of a text stored on some carrier medium or of some other object, digital or non-digital, specified in any convenient units.
- publicationStmt (publication statement) groups information concerning the publication or distribution of an electronic or other text.
- seriesStmt (series statement) groups information about the series, if any, to which a publication belongs.
- sourceDesc (source description) describes the source(s) from which an electronic text was derived or generated, typically a bibliographic description in the case of a digitized text, or a phrase such as "born digital" for a text which has no previous existence.
<fileDesc> is a mandatory element in plain TEI as well, but in TEI Lex-0 there are some additional constraints and recommendations related to the content of this element.
- In <titleStmt>, TEI Lex-0 recommends the use of type on <title> (with values either full or abbr) to record both the full bibliographic title of the lexicographic resource and the preferred abbreviated title for easy reference, should one exist.
<titleStmt> <title type="full">Lexicon Serbico-Germanico-Latinum</title> <title type="abbr">LSGL</title> </titleStmt>
- In <titleStmt>, TEI Lex-0 recommends the use of <persName> and <orgName> to distinguish between the names of persons and organizations. This is especially important since in some cases, the name of an institution is used to take up the collective authorship of a work.
- When using <persName>, TEI Lex-0 recommends to further structure the name with elements <forename> and <surname>.
- In <publicationStmt>, TEI Lex-0 requires the use of <availability> to record the <licence> of the given lexicographic resource. In other words, a TEI Lex-0 must include explicit information on the conditions under which the given resource can be used.
<publicationStmt xml:base="../TEILex0.examples/headers/St%C4%8DS.stripped.xml"> <publisher>Ústav pro jazyk český AV ČR, v. v. i.</publisher> <pubPlace>Praha</pubPlace> <availability> <licence target="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International (CC BY 4.0)</licence> </availability> </publicationStmt>StčS (1999-2011)
- In addition to <publisher> and <distributor>, the <publicationStmt> in TEI Lex-0 may include information on any other <authority> responsible for creating or making the resource available.
- If using <authority>, TEI Lex-0 requires the use of role with values funder, sponsor or rightsHolder.
2.2.1. Source description
In TEI Lex-0, <sourceDesc> is an optional element. Born-digital resources or those which cannot be properly sourced do not require a <sourceDesc>.
If a resource is sourced, <sourceDesc> in TEI Lex-0 requires that the sources be grouped in <listBibl> elements:
<listBibl type="dictionaries"></listBibl>
lists all the dictionaries that were used as a source for the given dictionary; if you are retrodigitizing a print dictionary, your <listBibl> may include only one <biblStruct> with the bibliographic information about your print source;<listBibl type="literature"></listBibl>
groups all the literature: for instance, all the sources used by the dictionary author to illustrate examples;<listBibl type="corpora"></listBibl>
groups the information on all the corpora that were used in the production of the given lexicographic resource.
TEI Lex-0 requires the use of <biblStruct> for structuring bibliographic information about each individual source. This, too, is a departure from vanilla TEI which is more permissive in this respect.
<sourceDesc xml:base="../TEILex0.examples/headers/VOLP.stripped.xml">
<listBibl type="dictionaries">
<biblStruct>
<monogr>
<title>Vocabulário Ortográfico da Língua Portuguesa</title>
<author>
<orgName>Academia das Ciências</orgName>
</author>
<imprint>
<publisher>Imprensa Nacional de Lisboa</publisher>
<date>1940</date>
</imprint>
<extent>1 volume</extent>
<extent>821 pp.</extent>
</monogr>
</biblStruct>
</listBibl>
</sourceDesc>VOLP (1940)
<sourceDesc xml:base="../TEILex0.examples/headers/EtymWB-XML.stripped.xml">
<listBibl type="dictionaries">
<biblStruct>
<monogr>
<author>
<persName>
<forename>Wolfgang</forename>
<surname>Pfeifer</surname>
</persName>
</author>
<title>Etymologisches Wörterbuch des Deutschen</title>
<edition>2</edition>
<imprint>
<publisher>Akademie Verlag</publisher>
<pubPlace>Berlin</pubPlace>
<date>1993</date>
<note>with additional notes by the author</note>
</imprint>
</monogr>
</biblStruct>
</listBibl>
</sourceDesc>EtymWB-XML (2009)
<sourceDesc xml:base="../TEILex0.examples/headers/St%C4%8DS.stripped.xml">
<listBibl type="dictionaries">
<biblStruct>
<monogr>
<title level="m" type="main">Staročeský slovník</title>
<title level="m" type="sub">[Seš.] 1–26: na – při</title>
<editor>
<persName>
<forename>Bohuslav</forename>
<surname>Havránek</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Vladimír</forename>
<surname>Šmilauer</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Václav</forename>
<surname>Křístek</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Jan</forename>
<surname>Petr</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Igor</forename>
<surname>Němec</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Emanuel</forename>
<surname>Michálek</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Jaroslava</forename>
<surname>Pečírková</surname>
</persName>
</editor>
<imprint>
<date>1968–2008</date>
<publisher>Academia</publisher>
<pubPlace>Praha</pubPlace>
</imprint>
</monogr>
</biblStruct>
<biblStruct>
<monogr>
<title level="m" type="main">Slovník staročeský</title>
<title level="m" type="sub">A – J</title>
<author>
<persName>
<forename>Jan</forename>
<surname>Gebauer</surname>
</persName>
</author>
<edition>druhé, nezměněné vydání</edition>
<imprint>
<date>1970</date>
<publisher>Academia</publisher>
<pubPlace>Praha</pubPlace>
</imprint>
</monogr>
</biblStruct>
<biblStruct>
<monogr>
<title level="m" type="main">Slovník staročeský</title>
<title level="m" type="sub">K – N</title>
<author>
<persName>
<forename>Jan</forename>
<surname>Gebauer</surname>
</persName>
</author>
<edition>druhé, nezměněné vydání</edition>
<imprint>
<date>1970</date>
<publisher>Academia</publisher>
<pubPlace>Praha</pubPlace>
</imprint>
</monogr>
</biblStruct>
</listBibl>
</sourceDesc>StčS (1999-2011)
<sourceDesc xml:base="../TEILex0.examples/headers/Morais.stripped.xml">
<listBibl type="dictionaries">
<biblStruct>
<monogr>
<title level="m" type="main">Diccionario da lingua portugueza composto pelo padre D. Rafael Bluteau, reformado, e accrescentado por Antonio de Moraes Silva, natural do Rio de Janeiro</title>
<title level="m" type="sub">A – K</title>
<author>
<persName>
<forename>António de</forename>
<surname>Morais Silva</surname>
</persName>
</author>
<imprint>
<pubPlace>Lisboa</pubPlace>
<publisher>Officina de Simão Thaddeo Ferreira</publisher>
<pubPlace>Lisboa</pubPlace>
<date when="1789">1789</date>
<note>Com Licença da Real Meza da Comissão Geral, sobre o Exame, e Censura dos Livros.</note>
<note>Vende-ſe na loja de Borel Borel, e Companhia, quaſi defronte da Igreja nova de Noſſa Senhora dos Martyres, na eſquina.</note>
</imprint>
<extent>Tomo primeiro</extent>
<extent>752 pp.</extent>
</monogr>
</biblStruct>
<biblStruct>
<monogr>
<title level="m" type="main">Diccionario da lingua portugueza composto pelo padre D. Rafael Bluteau, reformado, e accrescentado por Antonio de Moraes Silva, natural do Rio de Janeiro</title>
<title level="m" type="sub">L – Z</title>
<author corresp="https://isni.org/isni/0000000083438040">
<persName>
<forename>António de</forename>
<surname>Morais Silva</surname>
</persName>
</author>
<imprint>
<publisher>Officina de Simão Thaddeo Ferreira</publisher>
<pubPlace>Lisboa</pubPlace>
<date>1789</date>
</imprint>
<extent>Tomo segundo</extent>
<extent>541 pp.</extent>
</monogr>
</biblStruct>
</listBibl>
<listBibl type="literature">
<biblStruct>
<monogr corresp="https://purl.pt/29333">
<title>Abecedario Real e Regia Instrucçam dos Principes Lusitanos, composto de 63. discursos Politicos, & Moraes : offerecido ao Serenissimo Principe Dom Joam N.S. / pelo M.R.P. Fr. Joam dos Prazeres, Prègador Gèral, & Chronista mòr da Religiaõ do Principe dos Patriarcas Sam Bento</title>
<author>
<persName>
<surname>Prazeres</surname>
<forename>João dos</forename>
</persName>
</author>
<imprint>
<date>1692</date>
<pubPlace>Lisboa</pubPlace>
<publisher>na Officina de Miguel Deslandes, Impressor de S. Magestade</publisher>
<note>More information found in BND ; 191 p.</note>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="Academ.sing">
<monogr corresp="https://purl.pt/21936">
<title>Academia dos ſingulares de Lisboa dedicadas a Apollo</title>
<author>
<persName>
<surname>Faria</surname>
<forename>André Leitão de</forename>
</persName>
</author>
<imprint>
<date>1665</date>
<pubPlace>Lisboa</pubPlace>
<publisher>na Officina de Henrique Valente de Oliveira</publisher>
<biblScope unit="volume">2 t. em 2 vol.</biblScope>
<note>More information found in BND; 2 vol.</note>
</imprint>
</monogr>
</biblStruct>
<!-- [...] -->
</listBibl>
</sourceDesc>Silva (1789)
2.3. Encoding description
<encodingDesc> is an optional element, which can be used to document the methods and editorial principles which governed the transcription or encoding of the lexicographic resource in hand and may also include sets of coded definitions used elsewhere in the text.
For an explanation of how to encode a taxonomy of domain labels to be used for encoding usage labels, see section on hierarchical usage labels.
2.4. Profile description
In plain TEI, <profileDesc> is an optional element, whereas in TEI Lex-0, it is required. This is because the nature lexicographic resources is such that it is essential to identify and record the language(s) used as part of the resource metadata.
That's why <profileDesc> requires <langUsage> and <langUsage> requires at least one <language> element.
Regarding the use of the required attribute role and its possible values (objectLanguage, workingLanguage, sourceLanguage or targetLanguage), see the specification details for <language>.
2.5. Revision description
<revisionDesc> is optional in both TEI and TEI Lex-0. The element is used to document the revision history of the given file. For each recorded revision, one should use the <change> element , together with the appropriate attributes: when to indicate the date of the implemented change, resp to assign responsibility and n to assign a number to the particular change,
3. Entries
3.1. General remarks
An <entry> is a basic reference unit in a dictionary: it groups together all the information related to a particular lemma. For instance:
<entry xml:id="OALD.competitor" type="mainEntry" xml:lang="en"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>competitor</orth>
<hyph>com|peti|tor</hyph>
<pron>k@m"petit@(r)</pron>
</form>
<gramGrp>
<gram type="pos">n</gram>
</gramGrp>
<sense xml:id="OALD.competitor.1">
<def>person who competes.</def>
</sense>
</entry>OALD (1974)
<entry xml:id="MM.RSSKJ.круна" xml:lang="sr"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>кру̏на</orth>
</form>
<etym>(<cit type="etymon" xml:lang="de">
<lang norm="de" xml:lang="sr">нем.</lang>
<form>
<orth>Krone</orth>
</form>
</cit>
<pc>,</pc>
<cit type="etymon" xml:lang="la">
<lbl xml:lang="sr">из</lbl>
<lang expand="латински" norm="la">лат.</lang>
</cit>)</etym>
<sense xml:id="MM.RSSKJ.круна.1">
<num>1.</num>
<sense xml:id="MM.RSSKJ.круна.1a">
<num>а)</num>
<def>украс на глави као знак владарске власти;</def>
</sense>
<sense xml:id="MM.RSSKJ.круна.1b">
<num>б)</num>
<usg type="meaningType" expand="фигуративно" norm="figurative">фиг.</usg>
<def>владар.</def>
</sense>
</sense>
<sense xml:id="MM.RSSKJ.круна.2">
<num>2.</num>
<def>новчана јединица у неким европским земљама, разне вредности.</def>
</sense>
<sense xml:id="MM.RSSKJ.круна.3">
<num>3.</num>
<def>део лиснатог дрвета изнад стабле (гране и лшће);</def>
<xr type="synonymy">
<lbl>син.</lbl>
<ref type="sense">крошња</ref>
<pc>.</pc>
</xr>
</sense>
<sense xml:id="MM.RSSKJ.круна.4">
<num>4.</num>
<usg type="meaningType" expand="фигуративно" norm="figurative">фиг.</usg>
<def>врхунац, највиши домет неког рада, забаве.</def>
</sense>
</entry>Московљевић (1990)
3.2. Mandatory attributes
The TEI Lex-0 schema prescribes two mandatory attributes on <entry>:
- xml:id uniquely identifies the element it is associated with;
- xml:lang identifies the object language of the element it is associated with.
In XML, xml:lang is inherited from the immediately enclosing element or from its closest ancestor that has this attribute. This means that in XML not every element needs to have the xml:lang attribute.
TEI Lex-0 recommends that xml:lang be attached to so-called container elements (such as <entry> and <cit>) rather than individual <form> elements.
In addition, TEI Lex-0 privileges <entry>
as the dictionary’s central textual component by requiring both a unique identifier (xml:id) as well as xml:lang.
xml:lang identifies the object language of the element it is associated with. The language ‘tag’ (i.e. the value of this attribute) must follow IETF BCP 47, the Internet Engineering Task Force's best-practice document outlining standard identifiers for labeling language content. To learn more about what language tag is appropriate for your project, check out W3C's useful resource on choosing language tags.
If the language or language variety you are working on is not covered by BCP 47, make sure to follow the syntax of Private Use Tags described in BCP 47 Section 2.2.7 when creating one. Do this only if you are absolutely certain that no standard tag exists for your object language.
If you have created a "private" language tag, you can validate it (in terms of its structural well-formedness and validity) using the BCP 47 validator.
Language tags containing private-use subtags should be documented in the TEI header, specifically using one or more <language> elements grouped under <langUsage> inside <profileDesc>:
<profileDesc>
<langUsage>
<language ident="mix" role="objectLanguage">Mixtepec Mixtec</language>
<language ident="mix-x-YCNY" role="objectLanguage">Yucanany Mixtec</language>
</langUsage>
</profileDesc>
3.3. Grammatical properties
3.3.1. General remarks
Grammatical properties of lexical entries should be specified in entry/gramGrp/gram
. This <gram> element will typically specify the part-of-speech of the entry:
<entry xml:lang="en" type="mainEntry" xml:id="on">
<form type="lemma">
<orth>on</orth>
</form>
<gramGrp>
<gram type="pos">prep</gram>
</gramGrp>
<!--...-->
</entry>
Notes:
- Grammatical properties of the entry as a whole should not be specified in
entry/form[@type="lemma"]/gramGrp
. entry/form/gramGrp
should be used only if a particular form (a dialectal variant, for instance) has different grammatical properties from the lemma; or to indicate the grammatical properties of the inflected form which clearly deviate from the lemma.- For entries which group grammatical homonyms inside single entries (e.g. in English dictionaries which do not have separate entries for conversion pairs of nouns and verbs, such as run or aid see the discussion under Nested entries vs. multiple-senses.
3.3.2. Typology of gram
The TEI Guidelines define:
- seven specific elements which can be used to mark up particular grammatical properties:<case>, <gen> (for gender), <iType> (for inflection type), <mood>, <number>, <per> (for person) and <tns> (for tense); and
- one general element (<gram>) which can be used to encode different kinds of grammatical properties.
The Guidelines themselves do not explain the reasoning behind having two different mechanisms for encoding the same kind of information. The two mechanisms are treated as fully interchangeable: see, for instance, the first two examples in Section 9.3.2.
While it is perfectly understandable why marking up grammatical information using a number of specific, granular elements can be considered desirable, the current situation is less than perfect:
- if both
<pos>prep</pos>
and<gram type="pos">prep</gram>
are possible, and if both mean exactly the same thing, the choice about how to encode grammatical information will always be partially arbitrary; - the specific grammatical elements in TEI cover some important grammatical categories, but are certainly not exhaustive: for instance, Slavic dictionaries will, as a rule, indicate aspect (imperfective or perfective) as the defining grammatical property of verbs, yet there is no specific element for: <aspect> in TEI.
- if there are no specific elements for every possible grammatical category, mixing specific and general elements (for instance
<pos>v.</pos>
and<gram type="aspect">imperf.</gram>
within the same entry and/or dictionary will most likely further complicate data processing and data interoperability.
Considering the goals of TEI Lex-0 to serve as a common baseline and target format for transforming and comparing different lexical resources, we have decided to do away with the specific elements for grammatical properties. Instead, we recommend the use of typed <gram> elements. This is a decision that wasn't taken lightly and one which solicited a great deal of discussion. It goes without saying that TEI itself will continue to support both mechanisms and that an XSLT transformation from <pos>prep</pos>
to <gram type="pos">prep</gram>
for those who want to convert their dictionaries to TEI Lex-0 would be easily accomplished.
The following table shows a mapping between the specific TEI elements and the typed <gram> elements in TEI Lex-0:
TEI | TEI Lex-0 |
---|---|
<pos>n.</pos> | <gram type="pos">n.</gram> |
<case>acc.</case> | <gram type="case">acc.</gram> |
<gen>f.</gen> | <gram type="gender">f.</gram> |
<iType>7</iType> | <gram type="inflectionType">7</gram> |
<mood>indic.</mood> | <gram type="mood">indic.</gram> |
<number>sg.</number> | <gram type="number">sg.</gram> |
<per>3rd</per> | <gram type="person">3rd</gram> |
<tns>aorist</tns> | <gram type="tense">aorist</gram> |
<colloc>de</tns> | <gram type="colloc">de</gram> |
- | <gram type="aspect">imperf.</gram> |
- | <gram type="valency">intr.</gram> |
- | <gram type="government">[+conj.]</gram> |
Note: See also next section on Collocates.
TEI5 is missing a specific element for encoding the grammatical aspect of verbs (for values such as perfective
, imperfective
) and valency (for values such as transitive
, intransitive
, reflexive
, and impersonal
). TEI Lex-0 is therefore introducing two suggested grammatical types: gram[@type="aspect"]
and gram[@type="valency"]
for encoding such values in dictionaries.
The attribute values for gram[@type]
are a semi-closed list: this means that we will discuss and adopt additional values as demonstrated by examples from dictionaries that are encoded by members of our community.
If your dictionary has grammatical labels that do not fit into the above categories, do let us know by filing a ticket on GitHub.
3.3.3. Collocates
<entry>
<form>
<orth>médire</orth>
</form>
<gramGrp>
<colloc>de</colloc>
</gramGrp>
</entry>
<gram type="collocate"></gram>
to encode these phenomena, i.e.: ><entry xml:lang="fr" xml:id="DDLF.médire">
<form type="lemma">
<orth>médire</orth>
</form>
<gramGrp>
<gram type="collocate">de</gram>
</gramGrp>
</entry>
<gram type="governement"></gram>
<gramGrp>
<gram type="government">[+ conj.]</gram>
</gramGrp>
3.4. Deprecated entry-like elements
The current TEI Guidelines define five different container elements that may serve as grouping devices for entry-level lexical information:
- <entry>: contains a single structured entry in any kind of lexical resource, such as a dictionary or lexicon.
- <entryFree>: contains a single unstructured entry in any kind of lexical resource, such as a dictionary or lexicon.
- <superEntry>: groups a sequence of entries within any kind of lexical resource, such as a dictionary or lexicon which function as a single unit, for example a set of homographs.
- <re>: (related entry) contains a dictionary entry for a lexical item related to the headword, such as a compound phrase or derived form, embedded inside a larger entry.
- <hom>: (homograph) groups information relating to one homograph within an entry
These five elements can be used to distinguish different types of entries along two conceptual axes:
- Structured vs. unstructured entries, i. e. entries that can readily be represented (in the lexical view) in the spirit of the TEI Guideline’s Dictionary Chapter (<entry>, <re>) vs. entries that for some reason violate the generic content model of <entry> or <re> and thus have to be represented more freely (<entryFree>). A third category in this respect are entries that exhibit a highly reduced amount of lexical content while this content is still of essentially entry-like nature (<superEntry>).
- Containing vs. contained entries: entries may contain additional lexical information that can be conceived as an additional dictionary entry in its own right. Specifically, <superEntry> may contain <entry>, and <entry> in turn may contain <re> to represent the embedding of lexical entries on three distinct levels. Due to <re> being allowed to be used recursively, the number of levels for representing entry-like lexical information inside other such blocks is effectively unrestricted. At the same time, two different mechanism can be used to create homographic entries: <superEntry> containing multiple <entry> elements; or <entry> containing multiple <hom> elements.
3.4.1. hom
Making a clear difference between a situation where an entry has to be split into two or more homonyms and one where these differences correspond to a semantic alternation is lexicographically difficult. Still, the main danger in keeping both possibilities in the representation of a lexical entry in a digital lexicon is to introduce a systematic structural ambiguity as to where the appropriate information is to be found. We thus deprecate <hom> altogether in the present recommendation and have this element replaced by the nested <entry> construct.
For instance, the following example from the TEI Guidelines:
<entry>
<form>
<orth>bray</orth>
<pron>breI</pron>
</form>
<hom>
<gramGrp>
<gram type="pos">n</gram>
</gramGrp>
<sense>
<def>cry of an ass; sound of a trumpet.</def>
</sense>
</hom>
<hom>
<gramGrp>
<gram type="pos">vt</gram>
<subc>VP2A</subc>
</gramGrp>
<sense>
<def>make a cry or sound of this kind.</def>
</sense>
</hom>
</entry>
would in TEI Lex-0 be represented as:
<entry type="mainEntry" xml:id="bray" xml:lang="en"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>bray</orth>
<pron>brel</pron>
</form>
<entry xml:id="bray_n" xml:lang="en" type="homonymicEntry">
<gramGrp>
<gram type="pos">n</gram>
</gramGrp>
<sense xml:id="bray_n.1">
<def>cry of an ass</def>
</sense>
<pc>;</pc>
<sense xml:id="bray_n.2">
<def>sound of a trumpet</def>
</sense>
<pc>.</pc>
</entry>
<entry xml:id="bray_vt" xml:lang="en" type="homonymicEntry">
<gramGrp>
<gram type="pos">vt</gram>
<gram type="inflectionType">VP2A</gram>
</gramGrp>
<sense xml:id="bray_vt.1">
<def>make a cry or sound of this kind</def>
</sense>
<pc>.</pc>
</entry>
</entry>
In a similar fashion, consider this entry from the Dictionary of the Portuguese Language by Morais:
<entry xml:id="MORAIS.1.DLP.JANTAR" type="mainEntry" xml:lang="pt"
xml:base="../TEILex0.examples/examples.stripped.xml">
<entry xml:id="MORAIS.1.DLP.JANTAR-vt" type="homonymicEntry" xml:lang="pt">
<form type="lemma">
<orth>JANTAR</orth>
</form>
<metamark function="lemmaDelimiter">,</metamark>
<gramGrp>
<gram type="pos" norm="VERB">v.</gram>
<gram type="voice">at.</gram>
</gramGrp>
<sense xml:id="MORAIS.1.DLP.JANTAR.s.1">
<def>comer ao meio dia , ou comer depois de almoçar.</def>
</sense>
</entry>
<entry xml:id="MORAIS.1.DLP.JANTAR-n" type="homonymicEntry" xml:lang="pt">
<form type="lemma">
<orth>JANTAR</orth>
</form>
<metamark function="lemmaDelimiter">,</metamark>
<gramGrp>
<gram type="pos" norm="NOUN">ſ.</gram>
<gram type="gen">m.</gram>
</gramGrp>
<sense xml:id="MORAIS.1.DLP.JANTAR.s.2">
<def>a ſegunda das tres comidas regulares do dia, entre o almoço , e aceia , ou antes da merenda.</def>
</sense>
<pc>.</pc>
<metamark function="senseDelimiter">§</metamark>
<sense xml:id="MORAIS.1.DLP.JANTAR.s.3">
<def>Porção de dinheiro , que as Villas , e Cidades davão aos Reis , quando hião de correição para ſuſtento de ſua comitiva</def>
</sense>
<pc>.</pc>
<bibl type="attestation" source="#M._L._Monarchia_Luſitana">
<title>M. Luſ.</title>
<citedRange unit="volume">t. 5</citedRange>
<citedRange unit="folium">f. 53</citedRange>
<citedRange unit="chapter">cap. 27</citedRange>
</bibl>
</entry>
</entry>Silva (1789)
3.4.2. superEntry
By making <entry> recursive, TEI Lex-0 has eliminated the need for grouping entries with <superEntry>.
This is especially important for traditional root-based dictionaries, which start with the root as the main headword, followed by full-fledged lexicographic entries of derived headwords.
<entry type="wordFamily" xml:lang="ar" xml:id="syj"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="root">
<orth>سيج</orth>
</form>
<pc>:</pc>
<!-- To fence (verb) -->
<entry type="mainEntry" xml:lang="ar" xml:id="syj1">
<form type="lemma">
<orth>سيّج</orth>
</form>
<sense xml:id="syj1_sense1">
<cit type="example">
<quote>الكرم</quote>
</cit>
<pc>:</pc>
<def>جعل له سياجا</def>
</sense>
<pc>٠</pc>
</entry>
<!-- A fence (noun) -->
<entry type="mainEntry" xml:lang="ar" xml:id="syj2">
<form type="lemma">
<orth>السياج</orth>
</form>
<form type="inflected">
<gramGrp>
<gram type="number" value="plural">ج</gram>
</gramGrp>
<form type="variant">
<orth>سيَاجات</orth>
</form>
<lbl>و</lbl>
<form type="variant">
<orth>أسْوِجة</orth>
</form>
<lbl>و</lbl>
<form type="variant">
<orth>أَسْوِجة</orth>
</form>
<lbl>و</lbl>
<form type="variant">
<orth>سُوج</orth>
</form>
</form>
<pc>:</pc>
<sense xml:id="syj2_sense1">
<def>الحائط</def>
</sense>
<pc>||</pc>
<sense xml:id="syj2_sense2">
<def>ما أُحيط بهِ على شيءٍ كالكرم و النخل</def>
</sense>
</entry>
<pc>٠</pc>
<!-- A kind of fish -->
<entry type="mainEntry" xml:lang="ar" xml:id="syj3">
<form type="lemma">
<orth>السيْجان</orth>
</form>
<pc>(</pc>
<usg type="domain" value="animal">ح</usg>
<pc>)</pc>
<pc>:</pc>
<sense xml:id="syj3_sense1">
<def>نوع من السمك</def>
</sense>
</entry>
</entry>Almonjid (2014)
<entry type="wordFamily" xml:lang="ar" xml:id="shahama"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="root">
<orth>شهم</orth>
</form>
<pc>:</pc>
<entry type="wordfamily" xml:lang="ar" xml:id="shahama1">
<num>١ــ</num>
<entry type="mainEntry" xml:lang="ar" xml:id="shahama1_1">
<form type="lemma">
<orth>شَهَمَ</orth>
</form>
<form type="scheme">
<orth>ـَ</orth>
</form>
<form type="inflected">
<form type="variant">
<orth>شَهْمًا</orth>
</form>
<lbl>و</lbl>
<form type="variant">
<orth>شُهُمًا</orth>
</form>
</form>
<sense xml:id="shahama1_1_sense1">
<cit type="example">
<quote>الفرسَ</quote>
</cit>
<pc>:</pc>
<def>زجره</def>
</sense>
<pc>||</pc>
<lbl>و</lbl>
<sense xml:id="shahama1_1_sense2">
<cit type="example">
<quote>ــ الرجُل</quote>
</cit>
<pc>:</pc>
<def>افزعه</def>
</sense>
</entry>
<pc>٠</pc>
<entry type="mainEntry" xml:lang="ar" xml:id="shahama1_2">
<form type="lemma">
<orth>اَلمشْهوم</orth>
</form>
<pc>٠:</pc>
<sense xml:id="shahama1_2_sense1">
<def>المذعور</def>
</sense>
</entry>
</entry>
<entry type="wordFamily" xml:lang="ar" xml:id="shahama2">
<num>٢٠ ــ</num>
<entry type="mainEntry" xml:lang="ar" xml:id="shahama2_1">
<form type="lemma">
<orth>شَهُم</orth>
</form>
<form type="scheme">
<orth>ـُـ</orth>
</form>
<form type="inflected">
<form type="variant">
<orth>شَهَامةً</orth>
</form>
<lbl>و</lbl>
<form type="variant">
<orth>شُهُومَةُُ</orth>
</form>
</form>
<lbl>:</lbl>
<sense xml:id="shahama2_1_sense1">
<def> كان شهْمًا</def>
</sense>
</entry>
<pc>٠</pc>
<entry type="mainEntry" xml:lang="ar" xml:id="shahama2_2">
<form type="lemma">
<orth>الشَهْم</orth>
</form>
<form type="inflected">
<gramGrp>
<gram type="number" value="plural">ج</gram>
</gramGrp>
<orth>شِهام</orth>
</form>
<pc>:</pc>
<sense xml:id="shahama2_2_sense1">
<def>الذكيّ الفؤاد</def>
</sense>
<pc>||</pc>
<sense xml:id="shahama2_2_sense2">
<def>السيِّد النافذ الحكم</def>
</sense>
<pc>||</pc>
<sense xml:id="shahama2_2_sense3">
<lbl>وــ</lbl>
<form type="inflected">
<gramGrp>
<gram type="number" value="plural">ج</gram>
</gramGrp>
<orth>شُهُم</orth>
</form>
<pc>:</pc>
<def>الفرس النشيط السريع القويّ</def>
</sense>
</entry>
<pc>٠</pc>
<entry type="mainEntry" xml:lang="ar" xml:id="shahama2_3">
<form type="lemma">
<orth>اَلمَشْهُوم</orth>
</form>
<pc>*:</pc>
<sense xml:id="shahama2_3_sense1">
<def>الذكيّ الفؤاد</def>
</sense>
</entry>
</entry>
<entry type="wordFamily" xml:lang="ar" xml:id="shahama3">
<num>٠٣ ــ</num>
<entry type="mainEntry" xml:lang="ar" xml:id="shahama3_1">
<form type="lemma">
<orth>الشَيْهَم</orth>
</form>
<form type="inflected">
<gramGrp>
<gram type="number" value="plural">ج</gram>
</gramGrp>
<orth>شَيَهِم</orth>
</form>
<pc>(</pc>
<usg type="domain" value="animal">ح</usg>
<pc>)</pc>
<sense xml:id="shahama3_1_sense1">
<def>ذَكَر القنافذ</def>
</sense>
</entry>
<pc>٠</pc>
<entry type="mainEntry" xml:lang="ar" xml:id="shahama3_2">
<form type="lemma">
<orth>الشَيْهَمَة</orth>
</form>
<pc>:</pc>
<sense xml:id="shahama3_2_sense1">
<def>العجوز</def>
</sense>
</entry>
</entry>
</entry>Almonjid (2014)
See also Section on grammatical properties in senses.
4. Forms
The current TEI Guidelines allows for an extremely wide range of encoding possibilities for written and spoken forms. In the discussion which follows, we suggest ways in which the elements, in particular <form>, can be constrained. We give examples of use types not covered by the Guidelines, and propose some extensions.
4.1. A note on inheritance
We assume that in order to determine the complete properties of an element inside the entry tree, the principle of default inheritance applies, e.g. grammatical properties of a form are determined by collecting the sibling <gramGrp> of the ancestor-or-self of the focus element, where the superordinate grammatical properties can be overwritten by the lower-level properties. This principle is relatively straightforward in the case of grammatical properties, but more complex for the word paradigm, esp. in cases of variant forms. For more information c.f. Ide et al. (2000) and Erjavec et al. (2000).
4.2. Lemmas
The form element should always be qualified by its type. The lemma (i.e. headword) form should be encoded as form[@type="lemma"]
.
If it is necessary to specify the grammatical properties of the lemma form itself (as opposed to the grammatical properties of the entry), this is described by entry/form[@type="lemma"]/gramGrp
.
4.3. Inflected forms
Dictionaries often include additional forms next to the lemma. In English, these are used to specify irregular forms, such as “corpus / corpora” or “take / took”, whereas in inflectionally rich languages they are often used to help the user determine the correct paradigm of the word.
Such inflected forms should be encoded in entry/form[@type="inflected"]
, e.g.:
<entry xml:lang="en" xml:id="CH.go1"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>go</orth>
<pron>gō</pron>
</form>
<lbl rend="sup">1</lbl>
<gramGrp>
<gram type="pos">vi</gram>
</gramGrp>
<pc>(</pc>
<form type="inflected">
<gramGrp>
<gram type="participle">prp</gram>
</gramGrp>
<orth>gō'ing</orth>
</form>
<pc>;</pc>
<form type="inflected">
<gramGrp>
<gram type="participle">pap</gram>
</gramGrp>
<orth>gone</orth>
<pron>gon</pron>
<note>(see separate entries)</note>
</form>
<pc>;</pc>
<form type="inflected">
<gramGrp>
<gram type="participle">pat</gram>
</gramGrp>
<orth>went</orth>
<note>(supplied from <xr type="related">
<ref type="entry">wend</ref>
</xr>)</note>
</form>
<pc>;</pc>
<form type="inflected">
<gramGrp>
<gram type="person">3rd</gram>
<gram type="tense">pers</gram>
<gram type="number">sing</gram>
<gram type="tense">pres</gram>
<gram type="mood">indicative</gram>
</gramGrp>
<orth>goes</orth>
</form>
<pc>;</pc>
<!--...-->
</entry>Chambers (2011)
Or take this example: abeceda, -y: in Czech, "-y" is a genitive singular suffix for feminine nouns. We can mark-up the grammatical properties of the suffix, while providing the full form of the noun as well:
<entry type="mainEntry" xml:lang="cz" xml:id="en000008"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma" xml:id="en000008.hw1">
<orth>abeceda</orth>
</form>
<pc>,</pc>
<form type="inflected">
<gramGrp>
<gram type="case" value="genitiv"/>
<gram type="number" value="singular"/>
<gram type="gender" value="feminine"/>
</gramGrp>
<orth extent="suffix" expand="abecedy">-y</orth>
</form>
<!--...-->
</entry>
4.4. Paradigms
When several inflected forms can be present next to the lemma, these can be embedded into entry/form[@type="paradigm"]
. The decision on whether to use this extra element depends on the particular dictionary and language.
The other use case for paradigms is when the full inflectional paradigm of the word is embedded in the entry, i.e. when the dictionary also includes all the word-forms of the words covered, which can be useful for example in machine processing.
An entry may contain several paradigms, e.g. a partial one for humans and a full one for machines, or one for each stem of a verb. Each paradigm type should be distinguished by the subtype attribute.
<entry xml:id="perder" xml:lang="es"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>perder</orth>
</form>
<gramGrp>
<gram type="pos">verb</gram>
</gramGrp>
<form type="paradigm" subtype="present">
<form type="inflected">
<orth>pierdo</orth>
<gramGrp>
<gram type="person">1</gram>
<gram type="number">sg</gram>
<gram type="mood">indic</gram>
<gram type="voice">active</gram>
</gramGrp>
</form>
<!-- other inflected forms (of present indicative) here -->
<gramGrp>
<gram type="tns">present</gram>
</gramGrp>
</form>
<form type="paradigm" subtype="preteritum">
<form type="inflected">
<orth>perdí</orth>
<gramGrp>
<gram type="person">1</gram>
<gram type="number">sg</gram>
<gram type="mood">indic</gram>
<gram type="voice">active</gram>
</gramGrp>
</form>
<gramGrp>
<gram type="tense">preteritum</gram>
</gramGrp>
</form>
<!--... -->
</entry>
4.5. Variants
The representation of variation within a form is highly dependant upon the specifics of the features of the variation and the way in which they vary. However, as a general principle, variation may be encoded as form[@type="variant"]
and embedded within the parent element for which a subordinate feature exhibits variation.
4.5.1. Orthographic variation
Several kinds of orthographic variation may be distinguished. Below, we present some of the options with the corresponding examples.
Spelling variation due to change in language’s orthography convention:
<entry xml:id="Flussschifffahrt" xml:lang="de" type="compound"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth type="segmented">
<seg>Fluss</seg>
<seg>schifffahrt</seg>
</orth>
<form type="variant">
<orth>
<seg>Fluss</seg>
<pc>-</pc>
<seg>Schifffahrt</seg>
</orth>
</form>
<form type="variant">
<orth notAfter="1996">
<seg>Fluß</seg>
<seg>schiffahrt</seg>
</orth>
<usg type="temporal">Vor 1996 Rechtschreibung Reform</usg>
</form>
<gramGrp>
<gram type="pos">noun</gram>
</gramGrp>
</form>
<!--...-->
</entry>
The following example is from American English in which due to the lack of official conventions for transliteration of Arabic orthography to the English (Latin) script, the initial vowel in the surname ‘Osama Bin Laden’ varies between ‘O’ and ‘U’:
<entry xml:id="Osama" xml:lang="en"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<pron notation="ipa">
<seg xml:id="ousma" corresp="#usma #osma">ow."sa.ma</seg>
<seg>bɪn</seg>
<seg>ˈlaːdn̹</seg>
</pron>
<form type="variant">
<orth type="transliterated">
<seg xml:id="osma" corresp="#usma #ousma">Osama</seg>
<seg>Bin</seg>
<seg>Laden</seg>
</orth>
</form>
<form type="variant">
<orth type="transliterated">
<seg xml:id="usma" corresp="#osma #ousma">Usama</seg>
<seg>Bin</seg>
<seg>Laden</seg>
</orth>
</form>
</form>
<!--...-->
</entry>
4.5.2. Phonetic variation
In this example, the entry contains the single orthographic form as a direct child of the lemma and phonetic transcriptions of the two roughly equally used variant pronunciations of the word 'caramel' from American English.
<entry xml:id="caramel-en" xml:lang="en-US"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>caramel</orth>
<form type="variant">
<pron notation="ipa">'keɹə"mɛl</pron>
</form>
<form type="variant">
<pron notation="ipa">'kaɹmɫ̩</pron>
</form>
</form>
<gramGrp>
<gram type="pos">noun</gram>
</gramGrp>
<!-- ... -->
</entry>
In the example above, one could have chosen to mark up two different pronunciations using two <pron> elements inside the form[@type="lemma"]
. Considering, however, that each individual pronunciation could, in theory, be further qualified, for instance, by a <usg> note, indicating the geographic area in which the said pronunciation is used, TEI Lex-0 recommends that multiple variants, whether orthographic or orthoepic, be contained each in its own <form> element.
4.5.3. Regional or dialectal variation
In the following example from Mixtepec-Mixtec, there is variation in the form of the word for the city of Oaxaca between speakers from the village of Yucanany and the rest of the speakers. Since the Yucanany variety makes up only a small portion of the speakers of the language, this case of variation is represented as an embedded form[@type="variant"]
within the lemma. Note the use of usg[@type="geographic"]/placeName
to explicitly specify this feature in addition to the use of the private language subtag (@xml:lang="mix-x-YCNY"
) as per BCP 47.
<entry xml:id="Oaxaca-MIX" xml:lang="mix" type="compound"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>
<seg>Ñuu</seg>
<seg>Ntua</seg>
</orth>
<pron notation="ipa">
<seg>ɲùù</seg>
<seg>nd̪ùá</seg>
</pron>
<form type="variant" xml:lang="mix-x-YCNY">
<orth>Ntua</orth>
<pron notation="ipa">nd̪ùá</pron>
<usg type="geographic"> Yucanany
</usg>
</form>
</form>
<gramGrp>
<gram type="pos">locationNoun</gram>
</gramGrp>
<!--...-->
</entry>
4.6. Multiword expressions
The Dictionary Chapter of the TEI Guidelines is very sparse when it comes to recommendations for encoding polylexical units. The only mention of the adjective “multi-word” appears in the definition of the element <term>: “contains a single-word, multi-word, or symbolic designation which is regarded as a technical term” but this is not relevant for the encoding of polylexical units in general-purpose dictionaries.
TEI includes an element <colloc> (collocate), which is defined as containing “any sequence of words that co-occur with the headword with significant frequency” but, in a different example, “colloc” is used as an attribute value for the element <usg> (usage). It is precisely this type of ambiguity that TEI Lex-0 is trying to resolve.
The TEI Guidelines recommend the use of <re> (related entry) to encode “related entries for direct derivatives or inflected forms of the entry word, or for compound words, phrases, collocations, and idioms containing the entry word” with barely any useful examples, or discussion of how to encode different types of polylexical units. TEI Lex-0, on the other hand, does not include <re>. In TEI Lex-0, <entry> was made recursive in order to account for nestable entry-like structures without the need to resort to <re>, a differently named element whose content model would be indistinguishable from <entry> itself. Eventually, the new content model of <entry>, which allows nesting, was adopted by TEI itself (Tasovac 2020).
TODO: explain different types of mwe's from a dict. model perspective referring to Tasovac 2020)
4.6.1. Collocations
TODO: explain "lexicographically transparent"
<entry xml:id="DLPC.descalçar" xml:lang="pt"
xml:base="../TEILex0.examples/examples.stripped.xml">
<!--etc.-->
<sense xml:id="DLPC.descalçar.1">
<!--etc.-->
<form type="collocations">
<form type="collocation">
<orth>
<ref type="form" scope="currentEntry" value="descalçar">
<lbl>+</lbl>
</ref>
<seg>as botas</seg>
</orth>
<gramGrp>
<gram type="mwe" value="co-ocorrente_privilegiado"/>
</gramGrp>
</form>
<pc>,</pc>
<form type="collocation">
<orth>
<ref type="form" scope="currentEntry" value="descalçar"/>
<seg>as luvas</seg>
</orth>
<gramGrp>
<gram type="mwe" value="co-ocorrente_privilegiado"/>
</gramGrp>
</form>
<pc>,</pc>
<form type="collocation">
<orth>
<ref type="form" scope="currentEntry" value="descalçar"/>
<seg>as meias</seg>
</orth>
<gramGrp>
<gram type="mwe" value="co-ocorrente_privilegiado"/>
</gramGrp>
</form>
</form>
<pc>;</pc>
<form type="collocations">
<form type="collocation">
<orth>
<ref type="form" scope="currentEntry" value="descalçar">
<lbl>+</lbl>
</ref>
<seg>os sapatos</seg>
</orth>
<gramGrp>
<gram type="mwe" value="co-ocorrente_privilegiado"/>
</gramGrp>
</form>
</form>
<pc>.</pc>
</sense>
</entry>DLPC (2001)
4.6.2. Idiomatic expressions
TODO text ("lexicographically non-transparent")
<entry xml:lang="pt" xml:id="DLPC.bombeiro" type="mainEntry"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>bombeiro</orth>
</form>
<!--etc. -->
<sense xml:id="bombeiro.1">
<!--etc. -->
<entry xml:id="DLPC.bombeiro_voluntario" xml:lang="pt" type="relatedEntry">
<form type="lemma">
<orth>bombeiro voluntário</orth>
</form>
<gramGrp>
<gram type="mwe" value="combinatória_fixa"/>
</gramGrp>
<pc>,</pc>
<sense xml:id="DLPC.bombeiro_voluntario.1">
<def>o que pertence a uma corporação com a obrigatoriedade de acudir a
incêndios, acidentes, unicamente por filantropia</def>
<pc>.</pc>
</sense>
</entry>
<entry xml:id="DLPC.corpo_de_bombeiros" xml:lang="pt" type="relatedEntry">
<form type="lemma">
<orth>
<ref type="entry" scope="currentEntry">
<seg>corpo</seg>
<lbl rend="sup">+</lbl>
</ref>
<seg>de bombeiros</seg>
</orth>
</form>
<pc>.</pc>
</entry>
</sense>
<!--etc.-->
</entry>DLPC (2001)
5. Senses
5.1. General remarks
In the current TEI Dictionary Chapter, the content model of <entry> allows one to have sense-related information directly within <entry>. TEI Lex-0 proscribes a stricter use of these elements so that sense-related information is grouped within the <sense> element, in accordance with the underlying semasiological model implemented in the TEI Guidelines.
<sense> should be therefore considered mandatory for any dictionary entry that actually provides sense information for the headword. Further in this document, we consider some additional specific cases e.g. “referencing” entries (entries that simply point to other entries) and inflectional lexica (dictionaries that describe word forms only), where <sense> is not a mandatory child of <entry>.
As a consequence of making the use of <sense> more systematic within <entry>, we have seen (see section on <entry>) that some elements are no longer allowed as children of <entry>. We provide here a specific background for each of them:
- <def> is clearly intended to provide a prose description of a meaning within a <sense> element and should not appear in any other context;
- In the same way, it is recommended that <cit> be used exclusively as a child of <sense>, or when necessary within <dictScrap>;
- The case of <hom> is peculiar since it provides a subordinate organization to an entry which is redundant in relation to what <sense> allows one to represent. <hom> is not allowed in TEI Lex-0.
Note: In the case one has to deal with information that does not fit a <sense>-based organization, for instance in the process of retro-digitizing an existing dictionary source, the use of <dictScrap> is recommended. Further step in the encoding of the lexical content may lead to a more precise encoding in a second phase.
In TEI Lex-0, <sense> has a mandatory xml:id.
5.2. Limiting contexts for def
In the current TEI Guidelines, <def> is allowed within the following elements:
- Module core: <cit>
- Module dictionaries: <dictScrap>, <entry>, <entryFree>, <etym>, <hom>, <re>, <sense>
- Module namesdates: <nym>
TEI Lex-0 allows the use of <def> in <sense> only. All other existing contexts would be implemented by embedding <def> within a <sense>.
5.3. Glosses
5.3.1. Gloss vs. definition?
In the lexicographic literature, gloss is a rather amorphous category. Zgusta, in his classic Manual of Lexicography (1971), defines it as "any descriptive or explanatory note within the entry" which includes "short comments, explanatory remarks, semantic characteristics or qualifications" (270). Atkins and Rundell (2008) see the gloss as "a more informal explanation of the meaning of a multiword expression or example (or even part of one) in the entry,[...] chiefly used in monolingual dictionaries for learners, to help understanding" (209). While one could argue about the statement that this type of lexicographic construct is used "chiefly... in monolingual dictionaries for learners", it is certainly the case that glosses are expected to help users better understand or more easily locate the particular meaning of a word that they are looking up.
- fugitive (of persons)
- fugitive (verses)
<entry xml:id="ED.fugitive" xml:lang="en">
<form type="lemma">
<orth>fugitive</orth>
</form>
<sense n="1">
<gloss>(of persons)</gloss>
</sense>
<sense n="2">
<gloss>(verses)</gloss>
</sense>
</entry>
<entry xml:id="ED.fugitive" xml:lang="en">
<form type="lemma">
<orth>fugitive</orth>
</form>
<sense n="1">
<gloss>(of persons)</gloss>
<def>given to, or in the act of, running away from a place, especially to avoid arrest or persecution.</def>
</sense>
<sense n="2">
<gloss>(verses)</gloss>
<def>concerned or dealing with subjects of passing interest; ephemeral, occasional.</def>
</sense>
</entry>
On sense-distinguishing grammatical properties, see section Grammatical properties in senses
5.3.2. Glossing examples
Semantic glosses can occur at different levels of the entry hierarchy. In the previous section, we saw examples in which glosses were used as a kind of semantic shorthand for an individual sense. They can, however, be used to further qualify individual examples in the entry. Take, for instance, this entry from the Longman Dictionary of Contemporary English (2003):
living /... / adj 1 alive now [...] | The sun affects all living things (=people, animals, and plants). | A living language (=one that people still use) [….]
In TEI Lex-0, this entry would be represented as:
<entry xml:id="LDOCE.living" xml:lang="en" type="mainEntry"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>living</orth>
</form>
<gramGrp>
<gram type="pos">adj</gram>
</gramGrp>
<sense n="1" xml:id="LDOCE.living.1">
<num>1</num>
<def>alive now
<!--[...] -->
</def>
<metamark>|</metamark>
<cit type="example">
<quote>The sun affects all <ref type="entry" scope="currentEntry">living</ref>
things <gloss>(=people, animals, and plants)</gloss>.</quote>
</cit>
<metamark>|</metamark>
<cit type="example">
<quote>A <ref type="entry" scope="currentEntry">living</ref> language <gloss>(=one
that people still use)</gloss>
<!--[….] -->
</quote>
</cit>
</sense>
</entry>Gadsby (ed.) (2003)
5.4. Grammatical properties
In some dictionaries, individual dictionary senses may be associated with grammatical properties, such as part of speech or gender, that differ from the rest of the entry: for instance, a particular sense of a countable noun may be used only in plural. In such cases, <gramGrp> will be naturally placed inside the given <sense>:
Consider, for instance, the second sense of this entry:
<sense xml:id="DLPC.antepassado_b_2" n="2"
xml:base="../TEILex0.examples/examples.stripped.xml" xml:lang="pt">
<gramGrp>
<gram type="number">pl.</gram>
</gramGrp>
<def>Pessoas anteriormente ao momento actual.</def>
<xr type="synonymy">
<ref type="sense">antecessores</ref>
</xr>
<xr type="antonymy">
<ref type="sense">vindouros</ref>
</xr>
<cit type="example">
<quote>Hérdamos estes costumes dos nossos antepassados.</quote>
</cit>
<cit type="example">
<quote>Culto dos antepassados.</quote>
</cit>
</sense>DLPC (2001)
5.4.1. Grammatical glosses?
Zgusta also uses "gloss" to describe "grammatical indications in the broadest sense of the word" (1971, 240), using an example familiar from Latin (and many other) dictionaries:
- petere aliquid ab aliquo [to ask for something from somebody]
- petere Romam [to rush to Rome]
In theory, one could choose to encode such phenomena using <gloss>, but TEI Lex-0 recommends a clear separation of roles: <gloss> should be used for semantic or pragmatic information, whereas grammatical information should be encoded using the familiar gramGrp/gram
constructs:
<sense n="1" xml:id="LD.peto.1">
<gramGrp>
<gram type="rection">aliquid ab aliquo</gram>
</gramGrp>
</sense>
<sense n="1" xml:id="LD.peto.2">
<gramGrp>
<gram type="rection">Romam</gram>
</gramGrp>
</sense>
Here, too, it is important to note the possibility of ambiguity: unlike "petere aliquid ab aliquo", "petere Romam" could be interpreted as an example. The decision on such ambiguous cases should never be taken in isolation: editors of a digital edition need to consider the conventions of the dictionary as a whole before advising encoders on how to mark up such ambiguous cases.
5.4.2. Nested entries vs. multiple senses
While TEI Lex-0 has been created to simplify the choices available for encoding various lexicographic components, certain levels of ambiguity remain, often due to the highly condensed nature of dictionary content.
Consider, for instance, this entry:
Is this an entry with two senses? Or are these two entries that were on the account of typographic density merged into one?
The answer is as much in the eyes of the beholder, as it is in the eyes of the lexicographers behind the dictionary that the entry stems from, in this case The Chambers Dictionary. Both the encoder and lexicographers, however, are influenced by lexicographic and linguistic traditions in which they operate. For an overview of the homonymy-polysemy dilemma, see, for instance, Zöfgen 1989.
It can't be stressed enough that the goal of dictionary encoding is not to resolve linguistic disputes or evaluate lexicographic traditions but rather to create consistent, if abstracted, representations of lexicographic architectures.
So, what can we do in this particular case? Should we encode gash as an entry consisting of senses, each with a different part of speech, like this:
<entry xml:id="CHDOEL.gash2" xml:lang="en"
xml:base="../TEILex0.examples/examples.stripped.xml">
<!--this, as we'll explain later, is valid but not the preferred encoding-->
<form type="lemma">
<orth>gash</orth>
<pron>gash</pron>
</form>
<lbl type="homNum" rend="sup">2</lbl>
<sense xml:id="CHDOEL.gash2.1">
<pc>(</pc>
<usg type="socioCultural" expand="slang">sl</usg>
<pc>)</pc>
<gramGrp>
<gram type="pos">adj</gram>
</gramGrp>
<def>spare, extra</def>
<pc>.</pc>
</sense>
<metamark function="senseSeparator">◆</metamark>
<sense xml:id="CDHDOEL.gash2.2">
<gramGrp>
<gram type="pos">n</gram>
</gramGrp>
<pc>(</pc>
<usg type="temporal" expand="originally">orig</usg>
<lbl>and esp</lbl>
<usg type="domain" expand="nautical">naut</usg>
<pc>)</pc>
<def>rubbish, waste</def>
<pc>.</pc>
</sense>
</entry>
This is surely valid TEI Lex-0. There is conceptually nothing wrong with this encoding: it adequately represents the structure implied by the source text.
We should, however, try to look at the issue at hand from a broader, comparative, perspective.
- In the Portuguese polysemous entry antepassado above, we had a case in which one particular sense (used in plural only) deviated from the other senses (which are used in both singular and plural). Since the senses were numbered in the original, there was never any doubt about how we would encode this. It was clear from the outset:
- that the semantic information in that entry was grouped by a construct called <sense>;
- that senses inherited grammatical properties from the entry as a whole (i.e.
entry/gramGrp
); - that, implicitly, we could assume that each sense can be used with the noun in both singular and plural; and
- that the plural-only sense was grammatically exceptional, hence
entry/sense/gramGrp/
).
- The English example is different: gash as a verb and as a noun are grammatical homonyms. If we encode them, as we did above, as two senses within one entry, we end up with an entry in which there is no inheritance (of grammatical properties) and only exceptions (at each sense-level).
Because TEI Lex-0 is aimed at creating a baseline encoding to facilitate data exchange and comparison between different dictionaries, we, therefore, recommend to encode grammatical homonyms in TEI Lex-0 as nested entries and to use <gramGrp> in <sense> constructs to mark up sense-specific deviations from the rule of grammatical inheritance.
For that reason, our preferred encoding of gash as a verb and a noun would be:
<entry xml:id="CH.gash2" xml:lang="en"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>gash</orth>
<pron>gash</pron>
</form>
<lbl type="homNum" rend="sup">2</lbl>
<entry xml:id="CH.gash2.1" xml:lang="en" type="homonymicEntry">
<sense xml:id="CH.gash2.1.1">
<pc>(</pc>
<usg type="socioCultural" expand="slang">sl</usg>
<pc>)</pc>
<gramGrp>
<gram type="pos">adj</gram>
</gramGrp>
<def>spare, extra</def>
<pc>.</pc>
</sense>
</entry>
<metamark function="entrySeparator">◆</metamark>
<entry xml:id="CH.gash2.2" xml:lang="en" type="homonymicEntry">
<gramGrp>
<gram type="pos">n</gram>
</gramGrp>
<sense xml:id="CH.gahs2.2.1">
<pc>(</pc>
<usg type="temporal" expand="originally">orig</usg>
<lbl>and esp</lbl>
<usg type="domain" expand="nautical">naut</usg>
<pc>)</pc>
<def>rubbish, waste</def>
<pc>.</pc>
</sense>
</entry>
</entry>
For an example in which grammatical homonyms have themselves multiple senses, one of which is grammatically constrained, see, for instance:
<entry xml:id="ED.aid" xml:lang="en"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>aid</orth>
<pron>/ed/</pron>
</form>
<entry xml:id="ED.aid_n" xml:lang="en" type="homonymicEntry">
<gramGrp>
<gram type="pos">noun</gram>
</gramGrp>
<sense xml:id="ED.aid_n.1" n="1">
<num>1.</num>
<gramGrp>
<gram type="number" value="singularia tantum"/>
</gramGrp>
<def>help, especially money, food or other gifts given to people living in
difficult conditions</def>
<metamark function="exampleMarker">○</metamark>
<cit type="example">
<quote>aid to the earth-quake zone</quote>
</cit>
<cit type="example">
<quote>an aid worker</quote>
</cit>
<note>(NOTE: This meaning of aid has no plural.)</note>
<metamark function="relatedEntryMarker">○</metamark>
<entry type="relatedEntry" xml:id="ED.aid_n.1.in_aid_of" xml:lang="en">
<form type="lemma">
<orth>in aid of</orth>
</form>
<sense xml:id="ED.aid_n.1.in_aid_of.1">
<def>in order to help</def>
<metamark function="exampleMarker">○</metamark>
<cit type="example">
<quote>We give money in aid of the Red Cross.</quote>
</cit>
<metamark function="exampleMarker">○</metamark>
<cit type="example">
<quote>They are collecting money in aid of refugees.</quote>
</cit>
</sense>
</entry>
</sense>
<sense xml:id="ED.aid_n.2" n="2">
<num>2.</num>
<def>thing which helps you to do something</def>
<metamark function="exampleMarker">○</metamark>
<cit type="example">
<quote>kitchen aids</quote>
</cit>
</sense>
</entry>
<metamark function="subentryMarker">■</metamark>
<entry xml:id="ED.aid_v" xml:lang="en" type="homonymicEntry">
<gramGrp>
<gram type="pos">verb</gram>
</gramGrp>
<sense xml:id="ED.aid.v.1" n="1">
<num>1.</num>
<def>to help something to happen</def>
</sense>
<sense xml:id="ED.aid.v.2" n="2">
<num>2.</num>
<def>to help someone</def>
</sense>
</entry>
</entry>
6. Translations
6.1. Translation equivalents
TEI Guidelines:
<entry>
<form>
<orth>horrifier</orth>
</form>
<gramGrp>
<gram type="pos">v</gram>
</gramGrp>
<cit type="translation" xml:lang="en">
<quote>to horrify</quote>
</cit>
<cit type="example">
<quote>elle était horrifiée par la dépense</quote>
<cit type="translation" xml:lang="en">
<quote>she was horrified at the expense.</quote>
</cit>
</cit>
</entry>
TEI Lex-0:
<entry xml:id="horrifier" type="mainEntry" xml:lang="fr"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>horrifier</orth>
</form>
<gramGrp>
<gram type="pos">v</gram>
</gramGrp>
<sense xml:id="horrifier.1">
<cit type="translationEquivalent" xml:lang="en">
<form>
<orth>horrify</orth>
</form>
</cit>
<cit type="example">
<quote>elle était horrifiée par la dépense</quote>
<cit type="translation" xml:lang="en">
<quote>she was horrified at the expense</quote>
</cit>
</cit>
</sense>
</entry>
<entry type="mainEntry" xml:lang="en" xml:id="aid"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>Aid</orth>
</form>
<pc>,</pc>
<sense xml:id="aid.1">
<gramGrp>
<gram type="pos">v.a.</gram>
</gramGrp>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>aider</orth>
</form>
</cit>
<pc>,</pc>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>assister</orth>
</form>
</cit>
<pc>,</pc>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>secourir</orth>
</form>
</cit>
</sense>
<pc>;</pc>
<sense xml:id="aid.2">
<gramGrp>
<gram type="pos">s.</gram>
</gramGrp>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>aide</orth>
</form>
</cit>
<pc>,</pc>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>assistance</orth>
<pc>,</pc>
<gramGrp>
<gram type="gen">f.</gram>
</gramGrp>
</form>
</cit>
<pc>,</pc>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>secours</orth>
<pc>,</pc>
<gramGrp>
<gram type="gen">m.</gram>
</gramGrp>
</form>
</cit>
</sense>
<pc>;</pc>
<sense xml:id="aid.3">
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>sub-side</orth>
</form>
<pc>,</pc>
<gramGrp>
<gram type="gender">m.</gram>
</gramGrp>
</cit>
</sense>
<pc>;</pc>
<sense xml:id="aid.4">
<gloss>(pers)</gloss>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>aide</orth>
</form>
<pc>,</pc>
<gramGrp>
<gram type="gen">m.</gram>
<gram type="gen">f.</gram>
</gramGrp>
</cit>
</sense>
<entry type="relatedEntry" xml:lang="en" xml:id="by_the_aid_of">
<form type="lemma">
<orth>By the <ref type="oRef">_</ref> of</orth>
</form>
<pc>,</pc>
<sense xml:id="by_the_aid_of.1">
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>à l'aide de</orth>
</form>
</cit>
</sense>
</entry>
<pc>.</pc>
<entry type="relatedEntry" xml:lang="en" xml:id="in_aid_of">
<form>
<orth>In <ref type="oRef">_</ref> of</orth>
</form>
<pc>,</pc>
<sense xml:id="in_aid_of.1">
<gloss>(of performances)</gloss>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>au profit de</orth>
</form>
</cit>
<pc>,</pc>
<cit type="translationEquivalent">
<form>
<orth>au bénéfice de</orth>
</form>
</cit>
</sense>
</entry>
<pc>.</pc>
<entry type="derived" xml:lang="en" xml:id="aidless">
<form type="lemma">
<orth>_less</orth>
<pc>,</pc>
<gramGrp>
<gram type="pos">adj.</gram>
</gramGrp>
</form>
<sense xml:id="aidless.1">
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>sans aide</orth>
</form>
</cit>
<pc>,</pc>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>sans secours</orth>
</form>
</cit>
</sense>
<pc>;</pc>
<sense xml:id="aidless.2">
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>abandonné</orth>
</form>
</cit>
<pc>,</pc>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>délaissé</orth>
</form>
</cit>
</sense>
</entry>
</entry>
7. Cross-references
7.1. General remarks
The current TEI Guidelines provide several mechanisms by means of which one item of lexical information can refer to another, e.g.:
- <gloss> for the provision of simple (non refined) translation equivalents of the head word
<usg type="synonym"/>
for synonym references<cit type="translation"><quote><!--...--></quote></cit>
for translation equivalents in bilingual or translation dictionaries- <oRef> and <pRef> for the resolution of “~" headword placeholders in quotations and other dictionary text
- <xr> and <ref> as a general cross-referencing mechanism
<ptr/>
as a pointer to another location<link/>
element<mentioned/>
in the etymology section<term/>
for mentions of technical terms
In keeping with the approach of the TEI Lex-0, and considering that links/relations between lexical data elements are an essential part of the core lexical data model rather than mere convenience pointers for dictionary users, we need a more unified and more constrained mechanism for lexical references, whether they point to an existing lexical entity in some dictionary or lexicon, or in a more general way to lexical objects without a target reference.
The proposed mechanism has the following properties
- It applies only to references with a clear linguistic meaning.
- The number of arbitrary (or context-dependent) choices for the encoder is minimal; the semantics of the reference should not depend on context
- The relation between representing dictionary content and the underlying/implied lexical data model should be as transparent as possible
- No drastic changes to the TEI Guidelines are needed.
In the following section, we first present the recommended encoding, and then elicit how existing alternatives can be replaced accordingly.
7.2. xr vs. ref
In TEI Lex-0, we use <ref> as the general element for a lexical reference and <xr> as the enclosing element that groups all information related to this reference, including explicit labels such as "Syn.", "Cf.", "See also" etc. The reference may be internal to a dictionary or pointing to an external source, even when the actual target lexical object is not explicitly known. In the latter case, <ref> can be used without an explicit pointing attribute. Furthermore, the intended target of the reference can be a full entry, but, sometimes, also a specific sense.
For all such uses, the following attributes may be used on <xr> and <ref>:
- type is a mandatory attribute on <xr> for a lexical reference. Its default value is "related". This attribute can be used to indicate the lexical relation between the headword of the entry and the object referred to (see next section)
ref/@type
is required; it indicates the target object category (entry, sense); the type attribute on <ref> is also needed to distinguish lexicographic from bibliographic references..- xml:lang on <xr> is required when <ref> contains an explicit lexical form in a language which is different from the source language
ref/@target
to point to the URI of a lexical object. The value of this attribute is a machine-readable link to your cross-reference.ref/@notation
indicates, like we currently do on <orth> or <pron>, the notation used for the explicit lexical form, where applicable
Explicit dictionary labels which indicate the type of relationship between the current lexical item and the cross-reference should be encoded as <lbl> inside of <xr>.
7.2.1. Values of ref/@target
- If the reference has no explicit target, no target is used.
- As per TEI pointing mechanisms, the value of target must be an URI reference.
- For internal references (references to the same dictionary), TEI Lex-0 enforces the use of explicit pointers to the xml:id of an element being pointed to, preceded by
#
. See Section "Pointing Locally" in the TEI Guidelines. - TEI pointers should not be used in TEI Lex-0.
7.3. Cross-reference typology
7.3.1. Related
The default reference to another lexical unit when no more granular information about the type of relationship is available.
In TEI Lex-0, cross-references are by default encoded as <xr type="related"></xr>
.
<entry xml:lang="nl" xml:id="borcht"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>borcht</orth>
</form>
<xr type="related">
<lbl>Cf.</lbl>
<ref target="#M012340" type="entry">burcht</ref>
</xr>
</entry>
7.3.2. Synonymy
Relation between two lexical units X and Y which are syntactically identical and have the property that any declarative sentence S containing X has equivalent truth conditions to another sentence S’ which is identical to S, except that X is replaced by Y. (Adapted from Cruse 1986.)
Synonymy is the linguistic parallel of the identity relation between classes. Synonyms differ in peripheral traits, related for example to stylistic, dialectal or diachronic variations.
Examples: [de] {Hund, Köter}, [en] {flashlight, torch}, [en] {glad, joyful, happy}, [en] {violin, fiddle} [en] He plays the violin very well/He plays the fiddle very well.
In TEI Lex-0, synonyms are encoded inside <xr type="synonymy"></xr>
<entry xml:id="arbeitsunfähig" xml:lang="de" type="mainEntry"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>arbeitsunfähig</orth>
</form>
<sense xml:id="arbeitsunfähig.1">
<xr type="synonymy">
<ref type="entry">bettlägerig</ref>
</xr>
<pc>,</pc>
<xr type="synonymy">
<ref type="entry">krank</ref>
</xr>
<pc>,</pc>
<xr type="synonymy">
<ref type="entry">unpässlich</ref>
</xr>
<pc>;</pc>
</sense>
<sense xml:id="arbeitsunfähig.2">
<pc>(</pc>
<usg type="domain">bildungsspr.</usg>
<pc>):</pc>
<xr type="synonymy">
<ref type="entry">indisponiert</ref>
</xr>
</sense>
<sense xml:id="arbeitsunfähig.3">
<xr type="synonymy">
<pc>(</pc>
<lbl>oft</lbl>
<usg type="attitude">emotional</usg>
<pc>):</pc>
<ref type="entry">malade</ref>
</xr>
<pc>.</pc>
</sense>
</entry>Duden (2007)
7.3.3. Hyperonymy
Relation between lexical heads X and Y characterised by the property that the sentence This is a(n) Y entails, but is not entailed by the sentence This is a(n) X. (Adapted from Cruse 1986.)
Hyperonymy is the converse of hyponymy.
Example: dog/animal (animal is a hypernym of dog)
In TEI Lex-0, hyperonyms are encoded inside <xr type="hyperonymy"></xr>
.
<entry xml:id="XY.dog" xml:lang="en" type="mainEntry"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>dog</orth>
</form>
<gramGrp>
<gram type="pos">n</gram>
</gramGrp>
<xr type="hypernymy">
<ref type="entry">mammal</ref>
</xr>
</entry>
7.3.4. Hyponymy
Relation between lexical units X and Y characterised by the property that the sentence This is a(n) X entails, but is not entailed by the sentence This is a(n) Y. (Adapted from Cruse 1986.)
Hyponymy and its converse hypernymy are the linguistic parallels of the relation of inclusion between two classes.
Examples: [en] animal/dog, red/scarlet, to kill/to murder
In TEI Lex-0, hyponyms are encoded inside <xr type="hyponymy"></xr>
.
7.3.5. Meronymy
An inclusion relation between lexical heads X and Y which reflect a potential part-whole relation between their referents in discourse. (Adapted from Cruse 2011, p. 140)
Example: finger:hand (finger is said to be a meronym of hand, and hand is said to be the holonym of finger).
In TEI Lex-0, meornyms are encoded inside <xr type="meronymy"></xr>
.
7.3.6. Antonymy
Relation between lexical units of opposite meaning.
In TEI Lex-0, antonyms are encoded inside <xr type="antonymy"></xr>
.
<sense xml:id="DLPC.antepassado_a_1"
xml:base="../TEILex0.examples/examples.stripped.xml" xml:lang="pt">
<def>Que pertence ou viveu numa época anterior.</def>
<xr type="synonymy">
<ref type="sense">antecessor</ref>
</xr>
<xr type="synonymy">
<ref type="sense">sucessor</ref>
</xr>
<xr type="antonymy">
<ref type="sense">descendente</ref>
</xr>
<xr type="antonymy">
<ref type="sense">sucessor</ref>
</xr>
</sense>
7.4. Cross-references in definitions
In TEI, it is impossible to have a cross-reference inside a definition, yet some dictionaries do use this mechanism. In TEI Lex-0, <xr> is allowed within <def>:
<entry xml:id="VSK.SR.грдомајчић" xml:lang="sr"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>грдо́ма̑јчић</orth>
</form>
<pc>,</pc>
<gramGrp>
<gram type="pos">м</gram>
</gramGrp>
<usg type="geographic">
<pc>(</pc>у Ц.г.<pc>)</pc>
</usg>
<sense xml:id="VSK.SR.грдомајчић.1">
<def>као укор или поруга, и ваља да значи: којему је <xr type="related">
<ref type="entry" target="#VSK.SR.мајка">мајка</ref>
</xr> била <xr type="related">
<ref type="entry" target="VSK.SR.грдан2">грдна</ref>
</xr>
</def>
<pc>,</pc>
<cit type="translationEquivalent" xml:lang="de">
<form type="lemma">
<orth>ein Schimpfwort</orth>
</form>
</cit>
<pc>,</pc>
<cit type="translationEquivalent" xml:lang="la">
<form type="lemma">
<orth>convicium in mulierem</orth>
</form>
</cit>
<pc>.</pc>
</sense>
</entry>
7.5. Further examples
7.5.1. More complex example including quotations
<entry xml:id="dog" xml:lang="en"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>dog</orth>
</form>
<sense xml:id="dog.1">
<gramGrp>
<gram type="gen" value="m">Male or unknown gender</gram>
</gramGrp>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>chien</orth>
</form>
</cit>
<cit type="example" xml:lang="fr">
<quote> Le matin j'ouvre au <ref type="oRef">chien</ref> et je lui fais manger sa
soupe. Le soir je lui siffle de venir se coucher</quote>
<bibl>RENARD, Poil de Carotte, 1894, p. 102.</bibl>
<cit type="translation" xml:lang="en">
<!-- included in the french cit, otherwise relation is lost -->
<quote>In the morning, I open the door for the dog, and I
<!--...-->
</quote>
</cit>
</cit>
</sense>
<sense xml:id="dog.2">
<gramGrp>
<gram type="gen" value="f">Female</gram>
</gramGrp>
<cit type="translationEquivalent" xml:lang="fr">
<form type="lemma">
<orth>chienne</orth>
</form>
</cit>
<cit type="example" xml:lang="fr">
<quote>6. Les fleuristes, murmura Lorilleux, toutes des Marie-couche-toi-là. Eh
bien! Et moi? reprit la grande veuve, les lèvres pincées. Vous êtes galant.
Vous savez, je ne suis pas une <ref type="oRef">chienne</ref>, je ne me mets
pas les pattes en l'air, quand on siffle! </quote>
<bibl>ZOLA, L'Assommoir, 1877, p. 681.</bibl>
<cit type="translation" xml:lang="en">
<quote>
<!--...-->
</quote>
</cit>
</cit>
</sense>
</entry>
7.5.2. Antepassado
<entry xml:lang="pt" xml:id="DLPC.antepassado_a"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>antepassado</orth>
<pron>ɐ̃tɨpɐsˈadu</pron>
</form>
<form type="inflected">
<orth>antepassado</orth>
<gramGrp>
<gram type="gen">m.</gram>
</gramGrp>
</form>
<form type="inflected">
<orth>antepassada</orth>
<gramGrp>
<gram type="gen">f.</gram>
</gramGrp>
<pron>ɐ̃tɨpɐsˈadɐ</pron>
<lbl>:1</lbl>
</form>
<gramGrp>
<gram type="pos" norm="ADJ">adj.</gram>
</gramGrp>
<etym type="grammaticalization">
<seg type="desc">De</seg>
<cit type="etymon">
<form>
<orth extent="pref">ante-</orth>
</form>
</cit>
<lbl>+</lbl>
<cit type="etymon">
<form>
<orth>passado</orth>
</form>
</cit>
</etym>
<sense xml:id="DLPC.antepassado_a_1">
<def>Que pertence ou viveu numa época anterior.</def>
<xr type="synonymy">
<ref type="sense">antecessor</ref>
</xr>
<xr type="synonymy">
<ref type="sense">sucessor</ref>
</xr>
<xr type="antonymy">
<ref type="sense">descendente</ref>
</xr>
<xr type="antonymy">
<ref type="sense">sucessor</ref>
</xr>
</sense>
</entry>
7.5.3. Cross-references inside definitions
Allowed in TEI Lex-0. See this issue on GitHub.
8. Usage
Usage labels is a procedure which indicates that “a certain lexical item deviates in a certain respect from the main bulk of items described in a dictionary and that its use is subject to some kind of restriction”
In the current TEI guidelines, <usg> is defined as an element which marks up “usage information in a dictionary entry”. Prototypically, usage information is a label which can be attached at various points in the entry hierarchy in order to signal restrictions in terms of geographic regions, domains of specialized language or stylistic properties for the particular lexical item that it is attached to.
8.1. Label-like vs. narrative usage descriptions
Usage information ca be provided in dictionaries both in the form of label-like descriptors (often abbreviated) and as fuller narrative expressions.
Consider, for instance, the following senses taken from a German entry for Pflaume “plum” where usage information is provided by labels taken from fixed sets of values for stylistic and diatopic properties:
<entry xml:id="pflaume" xml:lang="de" type="mainEntry"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>Pflaume</orth>
</form>
<sense n="1" xml:id="pflaume.1">
<def xml:lang="de">Frucht des Pflaumenbaums</def>
<def xml:lang="en">fruit of the plum tree</def>
</sense>
<sense n="2" xml:id="pflaume.2">
<usg type="socioCultural" norm="colloquial">ugs.</usg>
<def xml:lang="de">Pflaumenbaum</def>
<def xml:lang="en">plum tree</def>
</sense>
<sense n="3" xml:id="pflaume.3">
<usg type="socioCultural" norm="casual">salopp</usg>
<usg type="socioCultural" norm="expletive">Schimpfwort</usg>
<def xml:lang="de">ungeschickter, untauglicher Mensch</def>
<def xml:lang="en">awkward, ineligible person</def>
</sense>
<sense n="4" xml:id="pflaume.4">
<usg type="geographic" norm="regional">landsch.</usg>
<usg type="socioCultural" norm="casual">salopp</usg>
<def xml:lang="de">anzügliche, leicht boshafte Bemerkung</def>
<def xml:lang="en">offensive, slightly mischievous remark</def>
</sense>
</entry>
In contrast to the example above, the following sample features an occurrence of a more verbose usage description that does not rely on a fixed vocabulary. The sample is taken from a Serbian dialect dictionary. The quote in the dialect is further qualified by a usage hint: “(said by a peasant woman in the field in hot weather)” which provides a particular context in which the quote was recorded.
<cit type="example" xml:base="../TEILex0.examples/examples.stripped.xml"
xml:lang="sr">
<quote>„Ду́ни, ве́тре, се́јче леб да пе́че”</quote>
<usg type="hint">(рекла сељанка на њиви за време врућине)</usg>
<bibl>(<placeName>Дубница</placeName>).</bibl>
</cit>Златановић (2017)
8.2. Types of usage
In TEI Lex-0, <usg> is a typed element and type is a mandatory attribute. The default value is: <usg type="hint"></usg>
. The default attribute value should be used when it is not possible to otherwise classify the usage label. The type of a <usg> should be thought of as a conceptual axis (independent from other types) along which the given value of the element is located.
The following list of label types and their definitions is adapted from Salgado et al. 2019b:
- temporal label: marker which identifies the use of a given lexical unit on a scale from old to new. Syn: diachronic marking; diachronic information; time label.
<usg type="time"/>
- geographic label: marker which identifies the place or region where a lexical unit is mainly used. Some dictionaries do not identify a specific place but identify that the word is not used generally in every geographic area (e.g., regionalismo in Portuguese, or покр. (abbrev. for покрајински) in Serbian). Syn: diatopic marking; diatopic information; region label.
<usg type="geographic"/>
- domain label: marker which identifies the specialized field of knowledge in which a lexical unit is mainly used. Syn: diatechnical marking; domain label; field label; subject field label; topic label.
<usg type="domain"/>
- frequency label: marker which identifies the relative rate of occurrence of a lexical unit in a given textual context. Syn: diafrequential marking; diafrequential information
<usg type="frequency"/>
- textType label: marker which identifies the typical use of a lexical unit in a particular discourse type or genre Syn: diatextual information.
<usg type="textType"/>
- attitude label: marker which identifies the speaker’s subjective point of view, positive or negative, regarding the object referred to by a given lexical unit. Syn: diaevaluative marking; diaevaluative information.
<usg type="attitude"/>
- socioCultural label: marker which identifies the use of a given lexical unit by particular social groups and/or in certain types of communicative situations depending on their level of formality Syn: diaphasic marking; diaphasic information.
<usg type="socioCultural"/>
- meaningType label: marker which identifies a semantic extension of the sense of a given lexical unit.
<usg type="meaningType"/>
- normativity label: marker which identifies the use of a given lexical unit which is in some aspect considered to be non-standard or incorrect.
<usg type="normativity"/>
The TEI Guidelines offer a range of sample values for types to illustrate potential uses of <usg>, but not al of them have been carried over to TEI Lex-0. The following table shows the differences between suggested values of type in TEI and the required values of type in TEI Lex-0:
TEI P5 (suggested types) | TEI Lex-0 (required types) | Еxample values |
time | temporal | archaic, old |
geo | geographic | AmE., dial. |
dom | domain | Med., Biol., Phys. |
plev | frequency | rare, occas. |
- | textType | bibl., poet., admin., journalese |
- | attitude | derog., euph. |
reg | socioCultural | slang, vulgar, formal |
style | meaningType | fig. (=figurative), lit. (= literal) |
- | normativity | non-standard, incorrect |
lang | - | |
gram | - | |
syn | - | |
hyper | - | |
colloc | - | |
comp | - | |
obj | - | |
subj | - | |
verb | - | |
hint | hint |
In TEI-Lex-0:
- The type attribute is made mandatory.
- The element <usg> is used in a narrower sense than is currently the case in the TEI Guidelines.
- The norm attribute is encouraged.
Justification:
- Without type attribute, <usg> would be an underspecified element. Usage labels describe a wide range of linguistic phenomena. Classifying them should be considered a good practice.
- Currently, the TEI Guidelines contain an overuse of <usg> for describing phenomena that could be covered by alternative, more narrowly defined TEI elements. It should be considered a good practice to use the most specific TEI element available. See table above and the next section Restricting the scope of <usg>
- It is good practice to normalize the values of the <usg> elements because dictionaries are not always consistent in the way they use their usage labels. For instance, abbreviated and unabbreviated labels can appear in the same dictionary: they should be normalized to a single value. Normalization should be only restricted to a single dictionary. A global normalization effort is currently beyond the scope of TEI Lex-0.
8.3. Restricting the scope of usg
Do not use <usg type="lang"> to mark up the name of a language in an etymological or other discussion. The recommended way to encode this information is using <lang> element within <etym>.
INCORRECT
<entryFree xml:id="MZ.RGJS.сајдисльк_1"> <form type="lemma"> <orth>сајдисль́к</orth> </form> <gramGrp> <gram type="pos">м</gram> </gramGrp> <usg type="lang">тур.</usg> <sense> <def>уважавање.</def> … </sense> </entryFree>
CORRECT
<entry xml:id="MZ.RGJS.сајдисльк_2" xml:lang="sr" xml:base="../TEILex0.examples/examples.stripped.xml"> <form type="lemma"> <orth>сајдисль́к</orth> </form> <gramGrp> <gram type="pos">м</gram> </gramGrp> <etym> <lang value="tr" expand="турцизам" norm="tr">*</lang> </etym> <!--...--> <sense xml:id="MZ.RGJS.сајдисльк_2.1"> <def>уважавање.</def> <!--...--> </sense> </entry>
- Do not use
<usg type="hyper"></usg>
or<usg type="syn"/>
to mark lexical relations such as hyperonymy or synonymy. The recommended way to encode lexical relations in TEI Lex-0 the reference mechanism provided by <xr>. See the secion on the typology of cross-references.. - Do not use
<usg type="colloc"></usg>
or for that matter "comp.", "obj.", "subj.", "verb" etc., to encode collocations or rection information. See TODO. <usg type="hint"></usg>
should be used as fallback for cases where the usage information does not fall into one of the recognized cases discussed above; or as an intermediate solution during the process of encoding the dictionary automatically.- Frequency information on lexicographic entities may differ from other types of usage information in that it often cannot be interpreted without further context. In phrases such as “mostly biology” or “rarely used in American English” it serves the purpose of a modifier (quantifier) to another usage information (or other lexical information). Such use calls for modeling the frequency information as an attribute to the usg element modified. For frequency information provided explicitly (e.g. corpus frequencies), a separate element should be introduced. TODO
8.4. Hierarchical usage labels
Usage labels tend to be described in dictionaries as flat lists: the list of all labels usually appears in the front matter, and often as part of lists of abbreviations, which may include different types of content, i.e. not only usage labels but also other types of abbreviations (grammatical, etymological etc.) This is less than ideal from a data-modeling point of view, especially when more generic usage labels (such as sport) appear together with more specific types of labels (such as football, basketball or volleyball).
To overcome the deficiency of flat representation of labels in general-language dictionaries, TEI Lex-0 recommends that canonical, possibly multilingual, labels be defined, when needed, in the <encodingDesc> section of the <teiHeader>, and then pointed to from the individual entries or senses in which these labels are used. This is possible in both TEI P5 and TEI Lex-0 but has not been documented until now as a solution for representing usage labels.
A <taxonomy> is encoded within a <classDecl> using <category> and <catDesc> elements. TEI Lex-0 is stricter than TEI P5 because it requires the use of <term> within <catDesc>. The definition of a given <term> can be optionally provided as a <gloss>.
The following example shows the recommended way of encoding two super domains earth science and sport, together with some of their subdomains:
<encodingDesc xml:base="../TEILex0.examples/headers/DLP.stripped.xml">
<classDecl>
<taxonomy xml:id="domain">
<category xml:id="domain.earth_sciences">
<catDesc xml:lang="en">
<term>Earth Sciences</term>
<gloss>
<!--Definition of the term would go here.-->
</gloss>
</catDesc>
<catDesc xml:lang="pt">
<term>Ciências da Terra</term>
</catDesc>
<catDesc xml:lang="es">
<term>Ciencias de la Tierra</term>
</catDesc>
<catDesc xml:lang="fr">
<term>sciences de la Terre</term>
</catDesc>
<category xml:id="domain.earth_sciences.geology">
<catDesc xml:lang="en">
<term>Geology</term>
</catDesc>
<catDesc xml:lang="pt">
<term>Geologia</term>
</catDesc>
<catDesc xml:lang="es">
<term>Geología</term>
</catDesc>
<catDesc xml:lang="fr">
<term>Geologie</term>
</catDesc>
<category xml:id="domain.earth_sciences.geology.mineralogy">
<catDesc xml:lang="en">
<term>Mineralogy</term>
</catDesc>
<catDesc xml:lang="pt">
<term>Mineralogia</term>
</catDesc>
<catDesc xml:lang="es">
<term>Mineralogía</term>
</catDesc>
<catDesc xml:lang="fr">
<term>Mineralogie</term>
</catDesc>
</category>
</category>
</category>
<category xml:id="domain.sports">
<catDesc xml:lang="en">
<term>Sport</term>
</catDesc>
<catDesc xml:lang="pt">
<term>Desporto</term>
</catDesc>
<catDesc xml:lang="es">
<term>Deporte</term>
</catDesc>
<catDesc xml:lang="fr">
<term>Sport</term>
</catDesc>
<category xml:id="domain.sports.football">
<catDesc xml:lang="en">
<term>Football</term>
</catDesc>
<catDesc xml:lang="pt">
<term>Futebol</term>
</catDesc>
<catDesc xml:lang="es">
<term>Fútebol</term>
</catDesc>
<catDesc xml:lang="fr">
<term>Football</term>
</catDesc>
</category>
</category>
</taxonomy>
</classDecl>
</encodingDesc>
To apply a domain label in an entry, use the <usg> element with a valueDatcat attribute pointing to the xml:id
of the appropriate category in the taxonomy.
<entry type="mainEntry" xml:lang="pt" xml:id="DLPC.cristalografia"
xml:base="../TEILex0.examples/headers/DLP.stripped.xml">
<form type="lemma">
<orth>cristalografia</orth>
<pron>kriʃtɐluɡrɐˈfiɐ</pron>
</form>
<gramGrp>
<gram type="pos" norm="NOUN">n.</gram>
<gram type="gen">f.</gram>
</gramGrp>
<sense xml:id="DLPC.cristalografia_1">
<usg type="domain" valueDatcat="#domain.earth_sciences.geology.mineralogy">Mineralogia</usg>
<def>ciência que estuda e descreve a forma e a estrutura dos cristais, bem como as leis que regem a sua formação</def>
</sense>
<!--etc.-->
</entry>
9. Etymology
This section needs to be transferred from Jack's and Laurent's paper.
10. Patterns
10.1. Inheritance of xml:lang
Some elements in TEI Lex-0, like <entry>, for instance, have a required attribute xml:lang; others like <form> or <quote> do not. In general, TEI Lex-0, unlike TEI, recommends that the xml:lang be attached to so-called container elements (for instance, <entry> and <cit>) rather than on individual word forms or textual segments.
TODO: Add some examples
So how can we extract all orthographic forms in a particular language? We can use an XPath expression like this: //orth[ancestor-or-self::*[@xml:lang][1][@xml:lang='en']]
.
This XPath expression identifies:
- each
orth
element, regardless of where it is in the document (//
) - but only if it itself or one of its ancestors has the
@xml:lang
attribute ([ancestor-or-self::*[@xml:lang]]
) - when looking for ancestors with the
@xml:lang
attribute, we stop at the first such ancestor (i.e. we look for the nearest ancestors) ([1]
) - finally, we filter out only those selected elements with the
@xml:lang
attribute whose value is'en'
If your dictionary uses multiple language tags for one language (as in 'en'
, 'en-GB
' and 'en-US'
) and you want to capture all language varieties with one XPath expression, you can use the XPath lang()
function as in: //orth[ancestor-or-self::*[@xml:lang][1][lang('en')]]
.
While the predicate [@xml:lang='en']
will match only those elements whose xml:lang
is exactly equal to 'en
', the predicate with the function [lang('en')]
will match all the elements whose language is tagged as either English (i.e. 'en'
) or one of its 'sublanguages' such as 'en-GB'
.
If you are new to XPath, you can check out a DARIAH-Campus tutorial XPath for Dictionary Nerds.
11. Bibliography
- Almonjid. 2014. The Dictionary of [Arabic] Language and Proper Nouns. Dar el-Machreq: Beirut.
- Atkins Rundell, B. T. S. Michael. 2008. The Oxford Guide to Practical Lexicography. Oxford University Press: Oxford; New York. ISBN callNumber: 9780199277711 P327 .A88 2008. .
- Chambers. 2011. The Chambers Dictionary. 12th Edition. Chambers Harrap Publishers: London. ISBN: 9780550102379.
- Cruse, D. A.. 1986. Lexical semantics. Cambridge University Press: Cambridge and New York. ISBN: 9780521276436.
- Cruse, D. A.. 2011. Meaning in language: an introduction to semantics and pragmatics. 3rd ed. Oxford University Press: Oxford. ISBN: 9780199559466.
- DLPC. 2001. Dicionário da Língua Portuguesa Contemporânea. Editorial Verbo: Lisboa.
- Du Cange, Charles. 1688. Glossarium ad Scriptores Mediae et Infimae Graecitatis. Apud Amissonios: Lugduni.
- Duden. 2007. Das Synonymwörterbuch. Dudenverlag: Mannheim.
- Erjavec, Tomaž, Roger Evans, Nancy Ide and Adam Kilgarriff. 2000. "The CONCEDE Model for Lexical Databases." Proceedings of the Second Language Resources and Evaluation Conference (LREC), 355-62.
- Ermolaev, Natalia and Toma Tasovac. 2012. "Building a Lexicographic Infrastructure for Serbian Digital Libraries." Libraries in the Digital Age (LIDA) Proceedings.
- EtymWB-XML. 2009. Wörterbuch des Deutschen: Die XML-Edition. Berlin-Brandenburgische Akademie der Wissenschaften: Berlin.
- Ide, Nancy, Adam Kilgarriff and Laurent Romary. 2000. "A Formal Model of Dictionary Structure and Content." Proceedings of Euralex 2000, 113-126. arxiv: 0707.3270.
- LDOCE. 2003. Longman Dictionary of Contemporary English. 4th Edition. Longman: Harlow. ISBN: 0582776465.
- OALD. 1974. Oxford Advanced Learner's Dictionary of Current English. Oxford University Press: Oxford.
- Romary, Laurent. 2015. "TEI and LMF crosswalks." Journal for language technology and computational linguistics. HAL: hal-00762664.
- Romary, Laurent and Toma Tasovac. 2018. "TEI Lex-0: A Target Format for TEI-Encoded Dictionaries and Lexical Resources." TEI Conference.
- Salgado, Ana, Rute Costa, Toma Tasovac and Alberto Simões. 2019. "TEI Lex-0 In Action: Improving the Encoding of the Dictionary of the Academia das Ciências de Lisboa." eLex 2019, 417-433.
- Salgado, Ana, Rute Costa and Toma Tasovac. 2019. "Improving the Consistency of Usage Labelling in Dictionaries with TEI Lex-0." Lexicography 6: 133–156. DOI: 10.1007/s40607-019-00061-x. .
- Silva, Antônio de Morais. 1789. Diccionario da lingua portugueza. Na Officina de Simão Thaddeo Ferreira: Lisboa.
- StčS. 1999-2011. Staročeský slovník. Ústav pro jazyk český AV ČR, v. v. i.: Praha.
- Svensén, Bo. 2009. A handbook of lexicography: the theory and practice of dictionary-making. Cambridge University Press: New York. ISBN: 9780521881807.
- Tasovac, Toma, Ana Salgado and Rute Costa. 2020. "Encoding Polylexical Units with TEI Lex-0: A Case Study." Slovenšcina 2.0.
- VOLP. 1940. Vocabulário Ortográfico da Língua Portuguesa [em linha]. Academia das Ciências de Lisboa/Imprensa Nacional de Lisboa: Lisboa.
- Zgusta, Ladislav. 1971. Manual of Lexicography. Academia: Prague. ISBN: 9783111980461.
- Zillig, Brian L Pytlik. 2009. "TEI Analytics: converting documents into a TEI format for cross-collection text analysis." Literary and Linguistic Computing 24: 187–192. DOI: 10.1093/llc/fqp005. .
- Zöfgen, Ekkehard. 1989. "Homonymie und Polysemie im allgemeinen einsprachigen Wörterbuch." Wörterbücher. Ein internationales Handbuch zur Lexikographie. I: 425-464.
- Златановић, Момчило. 2017. Речник говора јужне Србије: електронско издање. Институт за српски језик САНУ и Центар за дигиталне хуманистичке науке: Београд.
- Московљевић, Милош С.. 1990. Речник савременог српскохрватског књижевног језика с књижевним саветником. Аполон: Београд.
12. Specification
12.1. Elements
12.1.1. <TEI>
<TEI> (TEI document) contains a single TEI-conformant document, combining a single TEI header with one or more members of the model.resource class. Multiple <TEI> elements may be combined within a <TEI> (or <teiCorpus>) element. [4. Default Text Structure 15.1. Varieties of Composite Text] | |||||||||||||||||||
Module | textstructure — Specification | ||||||||||||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.typed (type, @subtype)
| ||||||||||||||||||
Contained by | textstructure: TEI | ||||||||||||||||||
May contain | |||||||||||||||||||
Note | This element is required. It is customary to specify the TEI namespace http://www.tei-c.org/ns/1.0 on it, for example: <TEI version="4.4.0" xml:lang="it" xmlns="http://www.tei-c.org/ns/1.0">. | ||||||||||||||||||
Example |
| ||||||||||||||||||
Example |
| ||||||||||||||||||
Schematron |
| ||||||||||||||||||
Schematron |
| ||||||||||||||||||
Content model |
| ||||||||||||||||||
Schema Declaration |
|
12.1.2. <abbr>
<abbr> (abbreviation) contains an abbreviation of any sort. [3.6.5. Abbreviations and Their Expansions] | |||||||||||||
Module | core — Specification | ||||||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.typed (type, @subtype)
| ||||||||||||
Member of | |||||||||||||
Contained by | |||||||||||||
May contain | |||||||||||||
Note | If abbreviations are expanded silently, this practice should be documented in the <editorialDecl>, either with a <normalization> element or a <p>. | ||||||||||||
Example |
| ||||||||||||
Example |
| ||||||||||||
Content model |
| ||||||||||||
Schema Declaration |
|
12.1.3. <affiliation>
<affiliation> (affiliation) contains an informal description of a person's present or past affiliation with some organization, for example an employer or sponsor. [15.2.2. The Participant Description] | |||||||||||
Module | namesdates — Specification | ||||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.editLike (@evidence, @instant) att.datable (@calendar, @period) (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) (att.datable.iso (@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)) (att.datable.custom (@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)) att.naming (@role, @nymRef) (att.canonical (@key, @ref)) att.typed (type, @subtype)
| ||||||||||
Member of | |||||||||||
Contained by | |||||||||||
May contain | |||||||||||
Note | If included, the name of an organization may be tagged using either the <name> element as above, or the more specific <orgName> element. | ||||||||||
Example |
| ||||||||||
Example | This example indicates that the person was affiliated with the Australian Journalists Association at some point between the dates listed.
| ||||||||||
Example | This example indicates that the person was affiliated with Mount Holyoke College throughout the entire span of the date range listed.
| ||||||||||
Content model |
| ||||||||||
Schema Declaration |
|
12.1.4. <analytic>
<analytic> (analytic level) contains bibliographic elements describing an item (e.g. an article or poem) published within a monograph or journal and not as an independent publication. [3.12.2.1. Analytic, Monographic, and Series Levels] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | core: biblStruct |
May contain | |
Note | May contain titles and statements of responsibility (author, editor, or other), in any order. The <analytic> element may only occur within a <biblStruct>, where its use is mandatory for the description of an analytic level bibliographic item. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.5. <appInfo>
<appInfo> (application information) records information about an application which has edited the TEI file. [2.3.11. The Application Information Element] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Member of | |
Contained by | header: encodingDesc |
May contain | Empty element |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.6. <author>
<author> (author) in a bibliographic reference, contains the name(s) of an author, personal or corporate, of a work; for example in the same form as that provided by a recognized bibliographic name authority. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.naming (@role, @nymRef) (att.canonical (@key, @ref)) att.datable (@calendar, @period) (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) (att.datable.iso (@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)) (att.datable.custom (@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)) |
Member of | |
Contained by | header: editionStmt titleStmt |
May contain | |
Note | Particularly where cataloguing is likely to be based on the content of the header, it is advisable to use a generally recognized name authority file to supply the content for this element. The attributes key or ref may also be used to reference canonical information about the author(s) intended from any appropriate authority, such as a library catalogue or online resource. In the case of a broadcast, use this element for the name of the company or network responsible for making the broadcast. Where an author is unknown or unspecified, this element may contain text such as Unknown or Anonymous. When the appropriate TEI modules are in use, it may also contain detailed tagging of the names used for people, organizations or places, in particular where multiple names are given. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.7. <authority>
<authority> (release authority) supplies the name of a person or other agency responsible for making a work available, other than a publisher or distributor. [2.2.4. Publication, Distribution, Licensing, etc.] | |||||||
Module | header — Specification | ||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.canonical (@key, @ref)
| ||||||
Member of | |||||||
Contained by | core: monogr header: publicationStmt | ||||||
May contain | |||||||
Example |
| ||||||
Content model |
| ||||||
Schema Declaration |
|
12.1.8. <availability>
<availability> (availability) supplies information about the availability of a text, for example any restrictions on its use or distribution, its copyright status, any licence applying to it, etc. [2.2.4. Publication, Distribution, Licensing, etc.] | |||||||||
Module | header — Specification | ||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.declarable (@default)
| ||||||||
Member of | |||||||||
Contained by | header: publicationStmt | ||||||||
May contain | |||||||||
Note | A consistent format should be adopted | ||||||||
Example |
| ||||||||
Example |
| ||||||||
Content model |
| ||||||||
Schema Declaration |
|
12.1.9. <back>
<back> (back matter) contains any appendixes, etc. following the main part of a text. [4.7. Back Matter 4. Default Text Structure] | |
Module | textstructure — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | textstructure: text |
May contain | |
Note | Because cultural conventions differ as to which elements are grouped as back matter and which as front matter, the content models for the <back> and <front> elements are identical. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.10. <bibl>
<bibl> (bibliographic citation) contains a loosely-structured bibliographic citation of which the sub-components may or may not be explicitly tagged. [3.12.1. Methods of Encoding Bibliographic References and Lists of References 2.2.7. The Source Description 15.3.2. Declarable Elements] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.declarable (@default) att.typed (@type, @subtype) att.sortable (@sortKey) att.docStatus (@status) |
Member of | |
Contained by | |
May contain | |
Note | Contains phrase-level elements, together with any combination of elements from the model.biblPart class |
Example |
|
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.11. <biblScope>
<biblScope> (scope of bibliographic reference) defines the scope of a bibliographic reference, for example as a list of page numbers, or a named subdivision of a larger work. [3.12.2.5. Scopes and Ranges in Bibliographic Citations] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.citing (@unit, @from, @to) |
Member of | |
Contained by | header: seriesStmt |
May contain | |
Note | When a single page is being cited, use the from and to attributes with an identical value. When no clear endpoint is provided, the from attribute may be used without to; for example a citation such as ‘p. 3ff’ might be encoded It is now considered good practice to supply this element as a sibling (rather than a child) of <imprint>, since it supplies information which does not constitute part of the imprint. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.12. <biblStruct>
<biblStruct> (structured bibliographic citation) contains a structured bibliographic citation, in which only bibliographic sub-elements appear and in a specified order. [3.12.1. Methods of Encoding Bibliographic References and Lists of References 2.2.7. The Source Description 15.3.2. Declarable Elements] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.declarable (@default) att.typed (@type, @subtype) att.sortable (@sortKey) att.docStatus (@status) |
Member of | |
Contained by | |
May contain | core: analytic citedRange monogr note ref |
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.13. <body>
<body> (text body) contains the whole body of a single unitary text, excluding any front or back matter. [4. Default Text Structure] | |
Module | textstructure — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | textstructure: text |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.14. <c>
<c> (character) represents a character. [17.1. Linguistic Segment Categories] | |
Module | analysis — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.segLike (@function) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.fragmentable (@part)) att.typed (@type, @subtype) att.notated (@notation) |
Member of | |
Contained by | |
May contain | gaiji: g character data |
Note | Contains a single character, a <g> element, or a sequence of graphemes to be treated as a single character. The type attribute is used to indicate the function of this segmentation, taking values such as letter, punctuation, or digit etc. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.15. <catDesc>
<catDesc> (category description) describes some category within a taxonomy or text typology, either in the form of a brief prose description or in terms of the situational parameters used by the TEI formal <textDesc>. [2.3.7. The Classification Declaration] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.canonical (@key, @ref) |
Contained by | header: category |
May contain | |
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.16. <category>
<category> (category) contains an individual descriptive category, possibly nested within a superordinate category, within a user-defined taxonomy. [2.3.7. The Classification Declaration] | |
Module | header — Specification |
Attributes | att.datcat (@datcat, @valueDatcat, @targetDatcat) att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | |
May contain | |
Example |
|
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.17. <change>
<change> (change) documents a change or set of changes made during the production of a source document, or during the revision of an electronic file. [2.6. The Revision Description 2.4.1. Creation 11.7. Identifying Changes and Revisions] | |||||||
Module | header — Specification | ||||||
Attributes | att.ascribed (@who) att.datable (@calendar, @period) (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) (att.datable.iso (@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)) (att.datable.custom (@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)) att.docStatus (@status) att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.typed (@type, @subtype)
| ||||||
Contained by | header: revisionDesc | ||||||
May contain | |||||||
Note | The who attribute may be used to point to any other element, but will typically specify a <respStmt> or <person> element elsewhere in the header, identifying the person responsible for the change and their role in making it. It is recommended that changes be recorded with the most recent first. The status attribute may be used to indicate the status of a document following the change documented. | ||||||
Example |
| ||||||
Example |
| ||||||
Content model |
| ||||||
Schema Declaration |
|
12.1.18. <char>
<char> (character) provides descriptive information about a character. [5.2. Markup Constructs for Representation of Characters and Glyphs] | |
Module | gaiji — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | gaiji: charDecl |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.19. <charDecl>
<charDecl> (character declarations) provides information about nonstandard characters and glyphs. [5.2. Markup Constructs for Representation of Characters and Glyphs] | |
Module | gaiji — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Member of | |
Contained by | header: encodingDesc |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.20. <cit>
<cit> (cited quotation) contains a quotation from some other document, together with a bibliographic reference to its source. In a dictionary it may contain an example text with at least one occurrence of the word form, used in the sense being described, or a translation of the headword, or an example. [3.3.3. Quotation 4.3.1. Grouped Texts 9.3.5.1. Examples] | |||||||
Module | core — Specification | ||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.typed (type, @subtype)
| ||||||
Member of | |||||||
Contained by | |||||||
May contain | |||||||
Example |
| ||||||
Example |
| ||||||
Example |
| ||||||
Content model |
| ||||||
Schema Declaration |
|
12.1.21. <citedRange>
<citedRange> (cited range) defines the range of cited content, often represented by pages or other units [3.12.2.5. Scopes and Ranges in Bibliographic Citations] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.pointing (@targetLang, @target, @evaluate) att.citing (@unit, @from, @to) |
Member of | |
Contained by | core: bibl biblStruct |
May contain | |
Note | When a single page is being cited, use the from and to attributes with an identical value. When no clear endpoint is provided, the from attribute may be used without to; for example a citation such as ‘p. 3ff’ might be encoded |
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.22. <classDecl>
<classDecl> (classification declarations) contains one or more taxonomies defining any classificatory codes used elsewhere in the text. [2.3.7. The Classification Declaration 2.3. The Encoding Description] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Member of | |
Contained by | header: encodingDesc |
May contain | header: taxonomy |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.23. <date>
<date> (date) contains a date in any format. [3.6.4. Dates and Times 2.2.4. Publication, Distribution, Licensing, etc. 2.6. The Revision Description 3.12.2.4. Imprint, Size of a Document, and Reprint Information 15.2.3. The Setting Description 13.4. Dates] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.canonical (@key, @ref) att.datable (@calendar, @period) (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) (att.datable.iso (@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)) (att.datable.custom (@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)) att.editLike (@evidence, @instant) att.dimensions (@unit, @quantity, @extent, @precision, @scope) (att.ranging (@atLeast, @atMost, @min, @max, @confidence)) att.typed (@type, @subtype) |
Member of | |
Contained by | |
May contain | |
Example |
|
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.24. <def>
<def> (definition) contains definition text in a dictionary entry. [9.3.3.1. Definitions] | |
Module | dictionaries — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig)) |
Member of | |
Contained by | |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.25. <dictScrap>
<dictScrap> (dictionary scrap) encloses a part of a dictionary entry in which other phrase-level dictionary elements are freely combined. [9.1. Dictionary Body and Overall Structure 9.2. The Structure of Dictionary Entries] | |
Module | dictionaries — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Member of | |
Contained by | dictionaries: entry |
May contain | |
Note | May contain any dictionary elements in any combination. This element is used to mark part of a dictionary entry in which lower level dictionary elements appear, but which does not itself form an identifiable structural unit. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.26. <distributor>
<distributor> (distributor) supplies the name of a person or other agency responsible for the distribution of a text. [2.2.4. Publication, Distribution, Licensing, etc.] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.canonical (@key, @ref) |
Member of | |
Contained by | header: publicationStmt |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.27. <div>
<div> (text division) contains a subdivision of the front, body, or back of a text. [4.1. Divisions of the Body] | |
Module | textstructure — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.typed (@type, @subtype) att.written (@hand) |
Member of | |
Contained by | |
May contain | |
Example |
|
Schematron |
|
Schematron |
|
Content model |
|
Schema Declaration |
|
12.1.28. <edition>
<edition> (edition) describes the particularities of one edition of a text. [2.2.2. The Edition Statement] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Member of | |
Contained by | header: editionStmt |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.29. <editionStmt>
<editionStmt> (edition statement) groups information relating to one edition of a text. [2.2.2. The Edition Statement 2.2. The File Description] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | header: fileDesc |
May contain | |
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.30. <editor>
<editor> contains a secondary statement of responsibility for a bibliographic item, for example the name of an individual, institution or organization, (or of several such) acting as editor, compiler, translator, etc. [3.12.2.2. Titles, Authors, and Editors] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.naming (@role, @nymRef) (att.canonical (@key, @ref)) att.datable (@calendar, @period) (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) (att.datable.iso (@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)) (att.datable.custom (@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)) |
Member of | |
Contained by | header: editionStmt seriesStmt titleStmt |
May contain | |
Note | A consistent format should be adopted. Particularly where cataloguing is likely to be based on the content of the header, it is advisable to use generally recognized authority lists for the exact form of personal names. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.31. <editorialDecl>
<editorialDecl> (editorial practice declaration) provides details of editorial principles and practices applied during the encoding of a text. [2.3.3. The Editorial Practices Declaration 2.3. The Encoding Description 15.3.2. Declarable Elements] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.declarable (@default) |
Member of | |
Contained by | header: encodingDesc |
May contain | core: p |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.32. <email>
<email> (electronic mail address) contains an email address identifying a location to which email messages can be delivered. [3.6.2. Addresses] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Member of | |
Contained by | |
May contain | |
Note | The format of a modern Internet email address is defined in RFC 2822 |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.33. <encodingDesc>
<encodingDesc> (encoding description) documents the relationship between an electronic text and the source or sources from which it was derived. [2.3. The Encoding Description 2.1.1. The TEI Header and Its Components] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | header: teiHeader |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.34. <entry>
<entry> (entry) contains a single structured entry in any kind of lexical resource, such as a dictionary or lexicon. [9.1. Dictionary Body and Overall Structure 9.2. The Structure of Dictionary Entries] | |||||||||||||||||||||||
Module | dictionaries — Specification | ||||||||||||||||||||||
Attributes | att.sortable (@sortKey) att.global (xml:id, xml:lang, @n, @xml:base) att.global.rendition (@rend, @style, @rendition) att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select) att.global.analytic (@ana) att.global.facs (@facs) att.global.change (@change) att.global.responsibility (@cert, @resp) att.global.source (@source)
| ||||||||||||||||||||||
Member of | |||||||||||||||||||||||
Contained by | |||||||||||||||||||||||
May contain | |||||||||||||||||||||||
Note | Like all elements, <entry> inherits an xml:id attribute from the class global. No restrictions are placed on the method used to construct xml:ids; one convenient method is to use the orthographic form of the headword, appending a disambiguating number where necessary. Identification codes are sometimes included on machine-readable tapes of dictionaries for in-house use. It is recommended to use the <sense> element even for an entry that has only one sense to group together all parts of the definition relating to the word sense since this leads to more consistent encoding across entries. | ||||||||||||||||||||||
Example |
| ||||||||||||||||||||||
Content model |
| ||||||||||||||||||||||
Schema Declaration |
|
12.1.35. <etym>
<etym> (etymology) encloses the etymological information in a dictionary entry. [9.3.4. Etymological Information] | |||||||
Module | dictionaries — Specification | ||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig)) att.typed (type, @subtype)
| ||||||
Member of | |||||||
Contained by | |||||||
May contain | |||||||
Note | May contain character data mixed with any other elements defined in the dictionary tag set. There is no consensus on the internal structure of etymologies, or even on whether such a structure is appropriate. The <etym> element accordingly simply contains prose, within which names of languages, cited words, or parts of words, glosses, and examples will typically be prominent. The tagging of such internal objects is optional. | ||||||
Example |
| ||||||
Example |
| ||||||
Content model |
| ||||||
Schema Declaration |
|
12.1.36. <expan>
<expan> (expansion) contains the expansion of an abbreviation. [3.6.5. Abbreviations and Their Expansions] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.editLike (@evidence, @instant) |
Member of | |
Contained by | |
May contain | |
Note | The content of this element should be the expanded abbreviation, usually (but not always) a complete word or phrase. The <ex> element provided by the transcr module may be used to mark up sequences of letters supplied within such an expansion. If abbreviations are expanded silently, this practice should be documented in the <editorialDecl>, either with a <normalization> element or a <p>. |
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.37. <extent>
<extent> (extent) describes the approximate size of a text stored on some carrier medium or of some other object, digital or non-digital, specified in any convenient units. [2.2.3. Type and Extent of File 2.2. The File Description 3.12.2.4. Imprint, Size of a Document, and Reprint Information 10.7.1. Object Description] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Member of | |
Contained by | |
May contain | |
Example |
|
Example | The <measure> element may be used to supply normalized or machine tractable versions of the size or sizes concerned.
|
Content model |
|
Schema Declaration |
|
12.1.38. <figDesc>
<figDesc> (description of figure) contains a brief prose description of the appearance or content of a graphic figure, for use when documenting an image without displaying it. [14.4. Specific Elements for Graphic Images] | |
Module | figures — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | figures: figure |
May contain | |
Note | This element is intended for use as an alternative to the content of its parent <figure> element ; for example, to display when the image is required but the equipment in use cannot display graphic images. It may also be used for indexing or documentary purposes. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.39. <figure>
<figure> (figure) groups elements representing or containing graphic information such as an illustration, formula, or figure. [14.4. Specific Elements for Graphic Images] | |
Module | figures — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.placement (@place) att.typed (@type, @subtype) att.written (@hand) |
Member of | |
Contained by | core: abbr author bibl biblScope cit citedRange date editor email expan gloss head hi imprint item list name note p pubPlace publisher quote ref resp term title dictionaries: def dictScrap entry etym form gram gramGrp hyph lang lbl orth pron sense stress syll usg xr figures: figure linking: seg transcr: metamark |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.40. <fileDesc>
<fileDesc> (file description) contains a full bibliographic description of an electronic file. [2.2. The File Description 2.1.1. The TEI Header and Its Components] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | header: teiHeader |
May contain | |
Note | The major source of information for those seeking to create a catalogue entry or bibliographic citation for an electronic file. As such, it provides a title and statements of responsibility together with details of the publication or distribution of the file, of any series to which it belongs, and detailed bibliographic notes for matters not addressed elsewhere in the header. It also contains a full bibliographic description for the source or sources from which the electronic text was derived. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.41. <forename>
<forename> (forename) contains a forename, given or baptismal name. [13.2.1. Personal Names] | |
Module | namesdates — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.personal (@full, @sort) (att.naming (@role, @nymRef) (att.canonical (@key, @ref)) ) att.typed (@type, @subtype) |
Member of | |
Contained by | |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.42. <form>
<form> (form information group) groups all the information on the written and spoken forms of one headword. [9.3.1. Information on Written and Spoken Forms] | |||||||||||
Module | dictionaries — Specification | ||||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig)) att.typed (type, @subtype)
| ||||||||||
Member of | |||||||||||
Contained by | |||||||||||
May contain | |||||||||||
Example | (from TLFi) | ||||||||||
Content model |
| ||||||||||
Schema Declaration |
|
12.1.43. <front>
<front> (front matter) contains any prefatory matter (headers, abstracts, title page, prefaces, dedications, etc.) found at the start of a document, before the main body. [4.6. Title Pages 4. Default Text Structure] | |
Module | textstructure — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | textstructure: text |
May contain | |
Note | Because cultural conventions differ as to which elements are grouped as front matter and which as back matter, the content models for the <front> and <back> elements are identical. |
Example |
|
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.44. <g>
<g> (character or glyph) represents a glyph, or a non-standard character. [5. Characters, Glyphs, and Writing Modes] | |||||||
Module | gaiji — Specification | ||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.typed (@type, @subtype)
| ||||||
Member of | |||||||
Contained by | |||||||
May contain | Character data only | ||||||
Note | The name g is short for gaiji, which is the Japanese term for a non-standardized character or glyph. | ||||||
Example | This example points to a <glyph> element with the identifier ctlig like the following:
| ||||||
Example | The medieval brevigraph per could similarly be considered as an individual glyph, defined in a <glyph> element with the identifier per-glyph as follows:
| ||||||
Content model |
| ||||||
Schema Declaration |
|
12.1.45. <gloss>
<gloss> (gloss) identifies a phrase or word used to provide a gloss or definition for some other word or phrase. [3.4.1. Terms and Glosses 22.4.1. Description of Components] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.typed (@type, @subtype) att.pointing (@targetLang, @target, @evaluate) att.cReferencing (@cRef) |
Member of | |
Contained by | core: abbr author bibl biblScope cit citedRange date editor email expan gloss head hi item name note p pubPlace publisher quote ref resp term title figures: figDesc header: authority catDesc category change distributor edition extent licence principal rendition tagUsage taxonomy linking: seg transcr: metamark |
May contain | |
Note | The target and cRef attributes are mutually exclusive. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.46. <glyph>
<glyph> (character glyph) provides descriptive information about a character glyph. [5.2. Markup Constructs for Representation of Characters and Glyphs] | |
Module | gaiji — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | gaiji: charDecl |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.47. <gram>
<gram> (grammatical information) within an entry in a dictionary or a terminological data file, contains grammatical information relating to a term, word, or form. [9.3.2. Grammatical Information] | |||||||
Module | dictionaries — Specification | ||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig)) att.typed (type, @subtype)
| ||||||
Member of | |||||||
Contained by | |||||||
May contain | |||||||
Example |
| ||||||
Content model |
| ||||||
Schema Declaration |
|
12.1.48. <gramGrp>
<gramGrp> (grammatical information group) groups morpho-syntactic information about a lexical item, e.g. <pos>, <gen>, <number>, <case>, or <iType> (inflectional class). [9.3.2. Grammatical Information] | |
Module | dictionaries — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig)) att.typed (@type, @subtype) |
Member of | |
Contained by | |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.49. <graphic>
<graphic> (graphic) indicates the location of a graphic or illustration, either forming part of a text, or providing an image of it. [3.10. Graphics and Other Non-textual Components 11.1. Digital Facsimiles] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.media (@width, @height, @scale) (att.internetMedia (@mimeType)) att.resourced (@url) att.typed (@type, @subtype) |
Member of | |
Contained by | |
May contain | Empty element |
Note | The mimeType attribute should be used to supply the MIME media type of the image specified by the url attribute. Within the body of a text, a <graphic> element indicates the presence of a graphic component in the source itself. Within the context of a <facsimile> or <sourceDoc> element, however, a <graphic> element provides an additional digital representation of some part of the source being encoded. |
Example |
|
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.50. <head>
<head> (heading) contains any type of heading, for example the title of a section, or the heading of a list, glossary, manuscript description, etc. [4.2.1. Headings and Trailers] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.typed (@type, @subtype) att.placement (@place) att.written (@hand) |
Member of | |
Contained by | |
May contain | |
Note | The <head> element is used for headings at all levels; software which treats (e.g.) chapter headings, section headings, and list titles differently must determine the proper processing of a <head> element based on its structural position. A <head> occurring as the first element of a list is the title of that list; one occurring as the first element of a <div1> is the title of that chapter or section. |
Example | The most common use for the <head> element is to mark the headings of sections. In older writings, the headings or incipits may be rather longer than usual in modern works. If a section has an explicit ending as well as a heading, it should be marked as a <trailer>, as in this example:
|
Example | When headings are not inline with the running text (see e.g. the heading "Secunda conclusio") they might however be encoded as if. The actual placement in the source document can be captured with the place attribute.
|
Example | The <head> element is also used to mark headings of other units, such as lists:
|
Content model |
|
Schema Declaration |
|
12.1.51. <hi>
<hi> (highlighted) marks a word or phrase as graphically distinct from the surrounding text, for reasons concerning which no claim is made. [3.3.2.2. Emphatic Words and Phrases 3.3.2. Emphasis, Foreign Words, and Unusual Language] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.written (@hand) |
Member of | |
Contained by | |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.52. <hyph>
<hyph> (hyphenation) contains a hyphenated form of a dictionary headword, or hyphenation information in some other form. [9.3.1. Information on Written and Spoken Forms] | |
Module | dictionaries — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig)) att.notated (@notation) |
Member of | |
Contained by | |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.53. <idno>
<idno> (identifier) supplies any form of identifier used to identify some object, such as a bibliographic item, a person, a title, an organization, etc. in a standardized way. [13.3.1. Basic Principles 2.2.4. Publication, Distribution, Licensing, etc. 2.2.5. The Series Statement 3.12.2.4. Imprint, Size of a Document, and Reprint Information] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.renditi |