1. Introduction
1.1. TEI Lex-0 in a nutshell
TEI Lex-0 is both a technical specification and a set of community-based recommendations for encoding machine-readable dictionaries. It is rooted in the Guidelines of the Text Encoding Initiative (TEI) and delivered as a customization of the TEI schema.
Following the spirit of TEI Analytics, developed in the context of the MONK project (Zillig 2009), TEI Lex-0 aims at establishing a baseline encoding and a target format to facilitate the interoperability of heterogeneously encoded lexical resources. This is important both in the context of building lexical infrastructures as such (Ermolaev and Tasovac 2012) and in the context of developing generic TEI-aware tools such as dictionary viewers and profilers.
For the latest changes, see our revision history.
1.2. The community
Preliminary work for the establishment of TEI Lex-0 started in the Working Group "Retrodigitised Dictionaries" lead by Toma Tasovac and Vera Hildenbrandt as part of the COST Action European Network of e-Lexicography (ENeL). Upon the completion of the COST Action, the work on TEI Lex-0 was taken up by the DARIAH Working Group "Lexical Resources". Currently, the work on TEI Lex-0 is also supported by the H2020-funded European Lexicographic Infrastructure (ELEXIS).
1.2.1. DARIAH Working Group
The DARIAH Working Group on Lexical Resources is a self-organized scholarly community working under the auspices of the pan-European Digital Research Infrastructure for Arts and Humanities (DARIAH-EU). The goals of the WG are:
- to explore, assess and recommend standard tools and methods for the creation, application and dissemination of born-digital and retro-digitized lexical resources (dictionaries, lexicons, thesauri, word lists etc.) as well as other, similar kinds of structured data (gazetteers, almanacs, encyclopaedias etc.); and
- to foster, develop and publicize digitally-enabled lexicographic research from a cross-disciplinary and transnational perspective.
The WG focuses on the application and explication of existing standards, both onomasiological (TMF, TBX and SKOS) and semasiological (LMF, TEI, and Ontolex); draws upon the expertise of various DARIAH partners who are active in this field; and collaborates with relevant external projects and associations, such as the European Lexicographic Infrastructure (ELEXIS) and CLARIN in order to ascertain the widest possible reach of the Working Group’s results.
At the same time, the WG pursues a strong research-driven agenda on the diversity of European lexicographic heritage. In addition to investigating pan-European vocabularies and multiple dimensions of lexical borrowing, the working group evaluates current practices and formulates guidelines on data enrichment and mutual linking of existing electronic dictionaries in view of their common European heritage.
WG Chairs
Laurent Romary is Directeur de Recherche at Inria (team ALMAnaCH (France)). He received a PhD degree in computational linguistics in 1989 and his Habilitation in 1999. He carries out research on the modelling of semi-structured documents, with a specific emphasis on texts and linguistic resources. He has been active in standardisation activities with ISO, as chair of committee ISO/TC 37/SC 4 (2002-2014), chair of ISO/TC 37 (2016-) and the Text Encoding Initiative, as member (2001-2011) and chair (2008-2011) of its Technical Council. He also has a long-standing implication in open science related activities.
Toma Tasovac is Director of the Belgrade Center for Digital Humanities (BCDH) and DARIAH-EU. He was educated at Harvard University, Princeton University and Trinity College Dublin. His areas of interest include lexicography, data modeling, TEI, digital editions and research infrastructures. He previously served as the National Coordinator of DARIAH-RS and Chair of the National Coordinators' Committee at DARIAH-EU. Under Toma's leadership, BCDH has received funding from various national and international granting bodies, including Erasmus Plus and Horizon 2020.
DigiLex Blog
The working group runs a blog called DigiLex: Legacy Dictionaries Reloaded as a platform for sharing tips, raising questions and discussing methods for the creation of lexical resources.
1.2.2. ELEXIS
ELEXIS is a H2020-funded project which proposes to integrate, extend and harmonise national and regional efforts in the field of lexicography, both modern and historical, with the goal of creating a sustainable infrastructure which will (1) enable efficient access to high-quality lexical data in the digital age, and (2) bridge the gap between more advanced and lesser-resourced scholarly communities working on lexicographic resources.
1.2.3. Contributors
- Piotr Banski
- Jack Bowers
- Jesse de Does
- Katrien Depuydt
- Tomaž Erjavec
- Alexander Geyken
- Axel Herold
- Vera Hildenbrandt
- Mohamed Khemakhem
- Boris Lehečka
- Snežana Petrović
- Laurent Romary
- Ana Salgado
- Toma Tasovac
- Andreas Witt
1.2.4. The Rahtz Prize
In recognition of their work on TEI Lex-0, the DARIAH WG Lexical Resources was awarded the 2020 Rahtz Prize for TEI Ingenuity.
Members of the DARIAH Working Group Lexical Resources have made a valuable contribution to the Dictionaries Chapter of the TEI Guidelines. Their efforts and their expertise have been formidable and highly appreciated by the TEI Community for many years. — Martina Scholger, Chair of the TEI Technical Council
1.2.5. Meetings
The Working Group has organized a number of working meetings dedicated to the development of TEI Lex-0. These include:
- Toward Best Practice Guidelines for Encoding Legacy Dictionaries: An ENeL-DARIAH-PARTHENOS Expert Workshop. Preußische Staatsbibliothek, Berlin (17-19 November 2016).
- Overview of Retrodigitized Dictionaries and Best-Practice Guidelines For Encoding Legacy Dictionaries. ENeL Annual Meeting, Budapest (24 February 2017).
- TEI Lex-0 @DARIAH WG "Lexical Resources". Harnack Haus, Freie Universität Berlin (27 April 2017).
- TEI Lex-0 @DARIAH WG "Lexical Resources". Austrian Center for Digital Humanities, Austrian Academy of Sciences, Vienna (26 June 2017).
- TEI Lex-0: From Best-Practice Guidelines to a TEI Schema. DARIAH-EU Coordination Office, Berlin (2-3 May 2018). Funded by DARIAH-EU's Working Groups Funding Scheme and ELEXIS.
- TEI Lex-0 and Beyond: A Workshop. University of Ljubljana (16 July 2018). Funded by DARIAH-EU's Working Group Funding Scheme and ELEXIS.
- TEI Lex-0 Meeting. DARIAH-EU Coordination Office, Berlin (30 January 2019).
- Joint TEI Lex-0 / Ontolex-Lemon Meeting. Collocated with eLex 2019. Sintra, Portugal (4 October 2019). Funded by ELEXIS.
- Toward a TEI Lex-0 Publisher: A Workshop, DARIAH-EU Coordination Office, Berlin (16-17 December 2019). Funded by the Belgrade Center for Digital Humanities.
1.2.6. Training measures
TEI Lex-0 and best practices in lexical data modeling have been introduced to large number of young scholars at various training events, including:
- Lexical Data Masterclass 2017. Co-organized by DARIAH, the Berlin Brandenburg Academy of Sciences (BBAW), Inria and the Belgrade Center for Digital Humanities, with the support of the German Ministry of Education and Research (BMBF), CLARIN and DARIAH-DE. For an overview, check out this blog post.
- Lexical Data Masterclass 2018. Co-organized by DARIAH, the Berlin Brandenburg Academy of Sciences (BBAW), Inria and the Belgrade Center for Digital Humanities, with the support of the German Ministry of Education and Research (BMBF), French Ministry for Higher Education, Research and Innovation (MESRI), ELEXIS, CLARIN and DARIAH-DE. For an overview, check out From Àbèsàbèsì to XPath on DigiLex.
- From Print to Screen: The Theory and Practice of Digitizing Dictionaries. Lisbon Summer School in Linguistics (2-6 July 2018).
- Encoding Dictionaries with TEI: A Masterclass. Lisbon Summer School in Linguistics (1-5 July 2019).
- DH Training Workshop: Digital Methods for Linguistic Investigation (13-15 November 2019). Organized by the Seminar für Semitistik und Arabistik, Freie Universität Berlin, with the support of the Alexander von Humboldt Foundation and Syncro Soft.
The European Digital Humanities Masterclass 2020 had to be postponed due to the Corona pandemic.
A picture is worth a thousand words
1.3. The rationale
To what extent can we achieve consistent encoding within a given community of practice by following the TEI Guidelines? The topic is of particular importance for lexical data if we think of the potential wealth of content we could gain from pooling together the information available in the variety of highly structured, historical and contemporary lexical resources. The encoding possibilities offered by the Dictionaries Chapter in the Guidelines are too numerous and too flexible to guarantee sufficient interoperability and a coherent model for searching, visualising or enriching multiple lexical resources.
TEI Lex-0 should not be thought of as a replacement of the Dictionaries Chapter in the TEI Guidelines or as the format that must be necessarily used for editing or managing individual resources, especially in those projects and/or institutions that already have established workflows based on their own flavors of TEI. TEI Lex-0 should be primarily seen as a format that existing TEI dictionaries can be unequivocally transformed to in order to be queried, visualised, or mined in a uniform way. At the same time, however, there is no reason why TEI Lex-0 could not or should not be used as a best-practice example in educational settings or as a foundation of new TEI-based projects. This is especially true considering the fact that TEI Lex-0 aims to to stay as aligned as possible with the TEI subset developed in conjunction with the revision of the ISO LMF (Lexical Markup Framework) standard (cf. Romary 2015)
1.4. The guidelines
1.4.1. How to cite these guidelines
Full citationToma Tasovac, Laurent Romary, Piotr Banski, Jack Bowers, Jesse de Does, Katrien Depuydt, Tomaž Erjavec, Alexander Geyken, Axel Herold, Vera Hildenbrandt, Mohamed Khemakhem, Boris Lehečka, Snežana Petrović, Ana Salgado and Andreas Witt. 2018. TEI Lex-0: A baseline encoding for lexicographic data. Version 0.9.3. DARIAH Working Group on Lexical Resources. https://dariah-eric.github.io/lexicalresources/pages/TEILex0/TEILex0.html.
Short citationToma Tasovac, Laurent Romary et al. 2018. TEI Lex-0: A baseline encoding for lexicographic data. Version 0.9.3. DARIAH Working Group on Lexical Resources. https://dariah-eric.github.io/lexicalresources/pages/TEILex0/TEILex0.html.
1.4.2. Revision history
Changes to the TEI Lex-0 specification up to version 0.8.6 were included in comments inside the ODD file itself. Starting with version 0.9.0, we're listing a summary of the changes in this list for easier reference.
- <catDesc> must contain a <term>
- switch to using the external TEI add-on in oXygen when generating schema and documentation
- <usg> types between the specification and documentation (use
temporal
instead oftime
fix the mismatch in - <listBibl> in <sourceDesc> with three suggested type values:
dictionaries
,corpora
andliterature
require
- switch to using oXygen's TEI framework when generating schema and documentation
- <list> and <item> because lists feature prominently in dictionary front matter allow
- model.lexicalInter (based on model.inter), model.lexicalPhrase (based on model.phrase) and macro.lexicalParaContent (based on macro.paraContent) to make it easier to simplify the content model of various dictionary elements introduce
- model.listLike from model.lexicalInter remove
- link version number in the menu to revision history
- <abbr> and <expan> so that they can be used in lists of abbreviations in dictionary front matter allow
valency
as a suggested value ingram[@type="valency"]
introduced gram[@type="government"]
and clarified the difference fromgram[@type="colloc"]
. See sections on Typology ofgram
and Collocates introduced @type
mandatory on <TEI> made - <principal> and <affiliation> for more robust metadata in the <teiHeader> add
- fix namespace issues in html output
- Header section add new examples to the
- hierarchichal usage labels add section on
- <taxonomy>, <category> and <catDesc> in <classDecl> allow
- specification to a different webpage for quicker loading move the
- TEI Header add section on
- correction of various misspellings
- <monogr> (needed for <biblStruct>) add
- <forename> and <surname> for more fine-grained bibliographic information add
- <editorialDecl> add
- <email> to make possible contact information in the header add
- <availability> in <publicationStmt> to provide <licence> require
- <sourceDesc> optional make
- <biblStruct> in <sourceDesc> allow only
- model.publicationStmtPart.agency unbound to allow both <publisher> and <authority> in <publicationStmt> make
- role to <authority> with suggested values: funder, sponsor, rightsHolder add
- <language>, <langUsage> and <profileDesc> require
- role to <language> with a closed list of values: objectLanguage, workingLanguage, sourceLanguage, targetLanguage add
2. Header
2.1. General remarks
A lexical resource encoded in TEI Lex-0 must, like any TEI file, start with the root <TEI> element, which, in turn, must contain a <teiHeader> element.
TEI Lex-0, unlike TEI P5, however, requires the @type
attribute on the root TEI
with the value "lex-0".
A TEI header contains information about the lexical resource itself, its source(s), its encoding, and its revisions. Proper, structured metadata of this kind is equally important for scholars using the resource, for software processing them, and for cataloguers in libraries and archives.
The TEI header of a lexical resource has five major parts:
- a file description, tagged <fileDesc>, provides a full bibliographic description of the electronic lexical resource itself as well as the source(s), analogue or digital, from which it may have been derived. For details, see section File Description below.
- an encoding description, tagged <encodingDesc>, describes the relationship between the electronic resource and its source(s). It allows for detailed description of whether (or how) the electronic resource was produced, transcribed or normalized, how the encoder resolved ambiguities in the source, what levels of encoding or analysis were applied etc.
- a profile description, tagged <profileDesc>, contains classificatory and contextual information about the lexical resource including its object and working languages.
- a container for external metadata, tagged <xenoData>, contains metadata from non-TEI schemas, for instance Dublin Core, MARCXML or MODS, if available.
- a revision history, tagged <revisionDesc>, contains a list of changes made during the development of the lexical resource, both before and after its official release.
Of these, two elements are required in TEI Lex-0: <fileDesc> and <profileDesc>. It is highly recommended to include additional information in <encodingDesc>. It is also an example of good practice to record changes in <revisionDesc>.
2.2. File description
The bibliographic description of the given machine-readable lexical resource is absolutely essential for identifying the basic information about the resource itself, its creators and publishers as well as the conditions under which it is made available to the public.
The elements that make up <fileDesc> are:
- titleStmt (title statement) groups information about the title of a work and those responsible for its content.
- editionStmt (edition statement) groups information relating to one edition of a text.
- extent (extent) describes the approximate size of a text stored on some carrier medium or of some other object, digital or non-digital, specified in any convenient units.
- publicationStmt (publication statement) groups information concerning the publication or distribution of an electronic or other text.
- seriesStmt (series statement) groups information about the series, if any, to which a publication belongs.
- sourceDesc (source description) describes the source(s) from which an electronic text was derived or generated, typically a bibliographic description in the case of a digitized text, or a phrase such as "born digital" for a text which has no previous existence.
<fileDesc> is a mandatory element in plain TEI as well, but in TEI Lex-0 there are some additional constraints and recommendations related to the content of this element.
- In <titleStmt>, TEI Lex-0 recommends the use of type on <title> (with values either full or abbr) to record both the full bibliographic title of the lexicographic resource and the preferred abbreviated title for easy reference, should one exist.
<titleStmt> <title type="full">Lexicon Serbico-Germanico-Latinum</title> <title type="abbr">LSGL</title> </titleStmt>
- In <titleStmt>, TEI Lex-0 recommends the use of <persName> and <orgName> to distinguish between the names of persons and organizations. This is especially important since in some cases, the name of an institution is used to take up the collective authorship of a work.
- When using <persName>, TEI Lex-0 recommends to further structure the name with elements <forename> and <surname>.
- In <publicationStmt>, TEI Lex-0 requires the use of <availability> to record the <licence> of the given lexicographic resource. In other words, a TEI Lex-0 must include explicit information on the conditions under which the given resource can be used.
<publicationStmt xml:base="../TEILex0.examples/headers/St%C4%8DS.stripped.xml"> <publisher>Ústav pro jazyk český AV ČR, v. v. i.</publisher> <pubPlace>Praha</pubPlace> <availability> <licence target="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International (CC BY 4.0)</licence> </availability> </publicationStmt>StčS (1999-2011)
- In addition to <publisher> and <distributor>, the <publicationStmt> in TEI Lex-0 may include information on any other <authority> responsible for creating or making the resource available.
- If using <authority>, TEI Lex-0 requires the use of role with values funder, sponsor or rightsHolder.
2.2.1. Source description
In TEI Lex-0, <sourceDesc> is an optional element. Born-digital resources or those which cannot be properly sourced do not require a <sourceDesc>.
If a resource is sourced, <sourceDesc> in TEI Lex-0 requires that the sources be grouped in <listBibl> elements:
<listBibl type="dictionaries"></listBibl>
lists all the dictionaries that were used as a source for the given dictionary; if you are retrodigitizing a print dictionary, your <listBibl> may include only one <biblStruct> with the bibliographic information about your print source;<listBibl type="literature"></listBibl>
groups all the literature: for instance, all the sources used by the dictionary author to illustrate examples;<listBibl type="corpora"></listBibl>
groups the information on all the corpora that were used in the production of the given lexicographic resource.
TEI Lex-0 requires the use of <biblStruct> for structuring bibliographic information about each individual source. This, too, is a departure from vanilla TEI which is more permissive in this respect.
<sourceDesc xml:base="../TEILex0.examples/headers/VOLP.stripped.xml">
<listBibl type="dictionaries">
<biblStruct>
<monogr>
<title>Vocabulário Ortográfico da Língua Portuguesa</title>
<author>
<orgName>Academia das Ciências</orgName>
</author>
<imprint>
<publisher>Imprensa Nacional de Lisboa</publisher>
<date>1940</date>
</imprint>
<extent>1 volume</extent>
<extent>821 pp.</extent>
</monogr>
</biblStruct>
</listBibl>
</sourceDesc>VOLP (1940)
<sourceDesc xml:base="../TEILex0.examples/headers/EtymWB-XML.stripped.xml">
<listBibl type="dictionaries">
<biblStruct>
<monogr>
<author>
<persName>
<forename>Wolfgang</forename>
<surname>Pfeifer</surname>
</persName>
</author>
<title>Etymologisches Wörterbuch des Deutschen</title>
<edition>2</edition>
<imprint>
<publisher>Akademie Verlag</publisher>
<pubPlace>Berlin</pubPlace>
<date>1993</date>
<note>with additional notes by the author</note>
</imprint>
</monogr>
</biblStruct>
</listBibl>
</sourceDesc>EtymWB-XML (2009)
<sourceDesc xml:base="../TEILex0.examples/headers/St%C4%8DS.stripped.xml">
<listBibl type="dictionaries">
<biblStruct>
<monogr>
<title level="m" type="main">Staročeský slovník</title>
<title level="m" type="sub">[Seš.] 1–26: na – při</title>
<editor>
<persName>
<forename>Bohuslav</forename>
<surname>Havránek</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Vladimír</forename>
<surname>Šmilauer</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Václav</forename>
<surname>Křístek</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Jan</forename>
<surname>Petr</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Igor</forename>
<surname>Němec</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Emanuel</forename>
<surname>Michálek</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Jaroslava</forename>
<surname>Pečírková</surname>
</persName>
</editor>
<imprint>
<date>1968–2008</date>
<publisher>Academia</publisher>
<pubPlace>Praha</pubPlace>
</imprint>
</monogr>
</biblStruct>
<biblStruct>
<monogr>
<title level="m" type="main">Slovník staročeský</title>
<title level="m" type="sub">A – J</title>
<author>
<persName>
<forename>Jan</forename>
<surname>Gebauer</surname>
</persName>
</author>
<edition>druhé, nezměněné vydání</edition>
<imprint>
<date>1970</date>
<publisher>Academia</publisher>
<pubPlace>Praha</pubPlace>
</imprint>
</monogr>
</biblStruct>
<biblStruct>
<monogr>
<title level="m" type="main">Slovník staročeský</title>
<title level="m" type="sub">K – N</title>
<author>
<persName>
<forename>Jan</forename>
<surname>Gebauer</surname>
</persName>
</author>
<edition>druhé, nezměněné vydání</edition>
<imprint>
<date>1970</date>
<publisher>Academia</publisher>
<pubPlace>Praha</pubPlace>
</imprint>
</monogr>
</biblStruct>
</listBibl>
</sourceDesc>StčS (1999-2011)
<sourceDesc xml:base="../TEILex0.examples/headers/Morais.stripped.xml">
<listBibl type="dictionaries">
<biblStruct>
<monogr>
<title level="m" type="main">Diccionario da lingua portugueza composto pelo padre D. Rafael Bluteau, reformado, e accrescentado por Antonio de Moraes Silva, natural do Rio de Janeiro</title>
<title level="m" type="sub">A – K</title>
<author>
<persName>
<forename>António de</forename>
<surname>Morais Silva</surname>
</persName>
</author>
<imprint>
<pubPlace>Lisboa</pubPlace>
<publisher>Officina de Simão Thaddeo Ferreira</publisher>
<pubPlace>Lisboa</pubPlace>
<date when="1789">1789</date>
<note>Com Licença da Real Meza da Comissão Geral, sobre o Exame, e Censura dos Livros.</note>
<note>Vende-ſe na loja de Borel Borel, e Companhia, quaſi defronte da Igreja nova de Noſſa Senhora dos Martyres, na eſquina.</note>
</imprint>
<extent>Tomo primeiro</extent>
<extent>752 pp.</extent>
</monogr>
</biblStruct>
<biblStruct>
<monogr>
<title level="m" type="main">Diccionario da lingua portugueza composto pelo padre D. Rafael Bluteau, reformado, e accrescentado por Antonio de Moraes Silva, natural do Rio de Janeiro</title>
<title level="m" type="sub">L – Z</title>
<author corresp="https://isni.org/isni/0000000083438040">
<persName>
<forename>António de</forename>
<surname>Morais Silva</surname>
</persName>
</author>
<imprint>
<publisher>Officina de Simão Thaddeo Ferreira</publisher>
<pubPlace>Lisboa</pubPlace>
<date>1789</date>
</imprint>
<extent>Tomo segundo</extent>
<extent>541 pp.</extent>
</monogr>
</biblStruct>
</listBibl>
<listBibl type="literature">
<biblStruct>
<monogr corresp="https://purl.pt/29333">
<title>Abecedario Real e Regia Instrucçam dos Principes Lusitanos, composto de 63. discursos Politicos, & Moraes : offerecido ao Serenissimo Principe Dom Joam N.S. / pelo M.R.P. Fr. Joam dos Prazeres, Prègador Gèral, & Chronista mòr da Religiaõ do Principe dos Patriarcas Sam Bento</title>
<author>
<persName>
<surname>Prazeres</surname>
<forename>João dos</forename>
</persName>
</author>
<imprint>
<date>1692</date>
<pubPlace>Lisboa</pubPlace>
<publisher>na Officina de Miguel Deslandes, Impressor de S. Magestade</publisher>
<note>More information found in BND ; 191 p.</note>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="Academ.sing">
<monogr corresp="https://purl.pt/21936">
<title>Academia dos ſingulares de Lisboa dedicadas a Apollo</title>
<author>
<persName>
<surname>Faria</surname>
<forename>André Leitão de</forename>
</persName>
</author>
<imprint>
<date>1665</date>
<pubPlace>Lisboa</pubPlace>
<publisher>na Officina de Henrique Valente de Oliveira</publisher>
<biblScope unit="volume">2 t. em 2 vol.</biblScope>
<note>More information found in BND; 2 vol.</note>
</imprint>
</monogr>
</biblStruct>
<!-- [...] -->
</listBibl>
</sourceDesc>Silva (1789)
2.3. Encoding description
<encodingDesc> is an optional element, which can be used to document the methods and editorial principles which governed the transcription or encoding of the lexicographic resource in hand and may also include sets of coded definitions used elsewhere in the text.
For an explanation of how to encode a taxonomy of domain labels to be used for encoding usage labels, see section on hierarchical usage labels.
2.4. Profile description
In plain TEI, <profileDesc> is an optional element, whereas in TEI Lex-0, it is required. This is because the nature lexicographic resources is such that it is essential to identify and record the language(s) used as part of the resource metadata.
That's why <profileDesc> requires <langUsage> and <langUsage> requires at least one <language> element.
Regarding the use of the required attribute role and its possible values (objectLanguage, workingLanguage, sourceLanguage or targetLanguage), see the specification details for <language>.
2.5. Revision description
<revisionDesc> is optional in both TEI and TEI Lex-0. The element is used to document the revision history of the given file. For each recorded revision, one should use the <change> element , together with the appropriate attributes: when to indicate the date of the implemented change, resp to assign responsibility and n to assign a number to the particular change,
3. Entries
3.1. General remarks
An <entry> is a basic reference unit in a dictionary: it groups together all the information related to a particular lemma. For instance:
<entry xml:id="OALD.competitor" type="mainEntry" xml:lang="en"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>competitor</orth>
<hyph>com|peti|tor</hyph>
<pron>k@m"petit@(r)</pron>
</form>
<gramGrp>
<gram type="pos">n</gram>
</gramGrp>
<sense xml:id="OALD.competitor.1">
<def>person who competes.</def>
</sense>
</entry>OALD (1974)
<entry xml:id="MM.RSSKJ.круна" xml:lang="sr"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>кру̏на</orth>
</form>
<etym>(<cit type="etymon" xml:lang="de">
<lang norm="de" xml:lang="sr">нем.</lang>
<form>
<orth>Krone</orth>
</form>
</cit>
<pc>,</pc>
<cit type="etymon" xml:lang="la">
<lbl xml:lang="sr">из</lbl>
<lang expand="латински" norm="la">лат.</lang>
</cit>)</etym>
<sense xml:id="MM.RSSKJ.круна.1">
<num>1.</num>
<sense xml:id="MM.RSSKJ.круна.1a">
<num>а)</num>
<def>украс на глави као знак владарске власти;</def>
</sense>
<sense xml:id="MM.RSSKJ.круна.1b">
<num>б)</num>
<usg type="meaningType" expand="фигуративно" norm="figurative">фиг.</usg>
<def>владар.</def>
</sense>
</sense>
<sense xml:id="MM.RSSKJ.круна.2">
<num>2.</num>
<def>новчана јединица у неким европским земљама, разне вредности.</def>
</sense>
<sense xml:id="MM.RSSKJ.круна.3">
<num>3.</num>
<def>део лиснатог дрвета изнад стабле (гране и лшће);</def>
<xr type="synonymy">
<lbl>син.</lbl>
<ref type="sense">крошња</ref>
<pc>.</pc>
</xr>
</sense>
<sense xml:id="MM.RSSKJ.круна.4">
<num>4.</num>
<usg type="meaningType" expand="фигуративно" norm="figurative">фиг.</usg>
<def>врхунац, највиши домет неког рада, забаве.</def>
</sense>
</entry>Московљевић (1990)
3.2. Mandatory attributes
The TEI Lex-0 schema prescribes two mandatory attributes on <entry>:
- xml:id uniquely identifies the element it is associated with;
- xml:lang identifies the object language of the element it is associated with.
In XML, xml:lang is inherited from the immediately enclosing element or from its closest ancestor that has this attribute. This means that in XML not every element needs to have the xml:lang attribute.
TEI Lex-0 recommends that xml:lang be attached to so-called container elements (such as <entry> and <cit>) rather than individual <form> elements.
In addition, TEI Lex-0 privileges <entry>
as the dictionary’s central textual component by requiring both a unique identifier (xml:id) as well as xml:lang.
xml:lang identifies the object language of the element it is associated with. The language ‘tag’ (i.e. the value of this attribute) must follow IETF BCP 47, the Internet Engineering Task Force's best-practice document outlining standard identifiers for labeling language content. To learn more about what language tag is appropriate for your project, check out W3C's useful resource on choosing language tags.
If the language or language variety you are working on is not covered by BCP 47, make sure to follow the syntax of Private Use Tags described in BCP 47 Section 2.2.7 when creating one. Do this only if you are absolutely certain that no standard tag exists for your object language.
If you have created a "private" language tag, you can validate it (in terms of its structural well-formedness and validity) using the BCP 47 validator.
Language tags containing private-use subtags should be documented in the TEI header, specifically using one or more <language> elements grouped under <langUsage> inside <profileDesc>:
<profileDesc>
<langUsage>
<language ident="mix" role="objectLanguage">Mixtepec Mixtec</language>
<language ident="mix-x-YCNY" role="objectLanguage">Yucanany Mixtec</language>
</langUsage>
</profileDesc>
3.3. Grammatical properties
3.3.1. General remarks
Grammatical properties of lexical entries should be specified in entry/gramGrp/gram
. This <gram> element will typically specify the part-of-speech of the entry:
<entry xml:lang="en" type="mainEntry" xml:id="on">
<form type="lemma">
<orth>on</orth>
</form>
<gramGrp>
<gram type="pos">prep</gram>
</gramGrp>
<!--...-->
</entry>
Notes:
- Grammatical properties of the entry as a whole should not be specified in
entry/form[@type="lemma"]/gramGrp
. entry/form/gramGrp
should be used only if a particular form (a dialectal variant, for instance) has different grammatical properties from the lemma; or to indicate the grammatical properties of the inflected form which clearly deviate from the lemma.- For entries which group grammatical homonyms inside single entries (e.g. in English dictionaries which do not have separate entries for conversion pairs of nouns and verbs, such as run or aid see the discussion under Nested entries vs. multiple-senses.
3.3.2. Typology of gram
The TEI Guidelines define:
- seven specific elements which can be used to mark up particular grammatical properties:<case>, <gen> (for gender), <iType> (for inflection type), <mood>, <number>, <per> (for person) and <tns> (for tense); and
- one general element (<gram>) which can be used to encode different kinds of grammatical properties.
The Guidelines themselves do not explain the reasoning behind having two different mechanisms for encoding the same kind of information. The two mechanisms are treated as fully interchangeable: see, for instance, the first two examples in Section 9.3.2.
While it is perfectly understandable why marking up grammatical information using a number of specific, granular elements can be considered desirable, the current situation is less than perfect:
- if both
<pos>prep</pos>
and<gram type="pos">prep</gram>
are possible, and if both mean exactly the same thing, the choice about how to encode grammatical information will always be partially arbitrary; - the specific grammatical elements in TEI cover some important grammatical categories, but are certainly not exhaustive: for instance, Slavic dictionaries will, as a rule, indicate aspect (imperfective or perfective) as the defining grammatical property of verbs, yet there is no specific element for: <aspect> in TEI.
- if there are no specific elements for every possible grammatical category, mixing specific and general elements (for instance
<pos>v.</pos>
and<gram type="aspect">imperf.</gram>
within the same entry and/or dictionary will most likely further complicate data processing and data interoperability.
Considering the goals of TEI Lex-0 to serve as a common baseline and target format for transforming and comparing different lexical resources, we have decided to do away with the specific elements for grammatical properties. Instead, we recommend the use of typed <gram> elements. This is a decision that wasn't taken lightly and one which solicited a great deal of discussion. It goes without saying that TEI itself will continue to support both mechanisms and that an XSLT transformation from <pos>prep</pos>
to <gram type="pos">prep</gram>
for those who want to convert their dictionaries to TEI Lex-0 would be easily accomplished.
The following table shows a mapping between the specific TEI elements and the typed <gram> elements in TEI Lex-0:
TEI | TEI Lex-0 |
---|---|
<pos>n.</pos> | <gram type="pos">n.</gram> |
<case>acc.</case> | <gram type="case">acc.</gram> |
<gen>f.</gen> | <gram type="gender">f.</gram> |
<iType>7</iType> | <gram type="inflectionType">7</gram> |
<mood>indic.</mood> | <gram type="mood">indic.</gram> |
<number>sg.</number> | <gram type="number">sg.</gram> |
<per>3rd</per> | <gram type="person">3rd</gram> |
<tns>aorist</tns> | <gram type="tense">aorist</gram> |
<colloc>de</tns> | <gram type="colloc">de</gram> |
- | <gram type="aspect">imperf.</gram> |
- | <gram type="valency">intr.</gram> |
- | <gram type="government">[+conj.]</gram> |
Note: See also next section on Collocates.
TEI5 is missing a specific element for encoding the grammatical aspect of verbs (for values such as perfective
, imperfective
) and valency (for values such as transitive
, intransitive
, reflexive
, and impersonal
). TEI Lex-0 is therefore introducing two suggested grammatical types: gram[@type="aspect"]
and gram[@type="valency"]
for encoding such values in dictionaries.
The attribute values for gram[@type]
are a semi-closed list: this means that we will discuss and adopt additional values as demonstrated by examples from dictionaries that are encoded by members of our community.
If your dictionary has grammatical labels that do not fit into the above categories, do let us know by filing a ticket on GitHub.
3.3.3. Collocates
<entry>
<form>
<orth>médire</orth>
</form>
<gramGrp>
<colloc>de</colloc>
</gramGrp>
</entry>
<gram type="collocate"></gram>
to encode these phenomena, i.e.: ><entry xml:lang="fr" xml:id="DDLF.médire">
<form type="lemma">
<orth>médire</orth>
</form>
<gramGrp>
<gram type="collocate">de</gram>
</gramGrp>
</entry>
<gram type="governement"></gram>
<gramGrp>
<gram type="government">[+ conj.]</gram>
</gramGrp>
3.4. Deprecated entry-like elements
The current TEI Guidelines define five different container elements that may serve as grouping devices for entry-level lexical information:
- <entry>: contains a single structured entry in any kind of lexical resource, such as a dictionary or lexicon.
- <entryFree>: contains a single unstructured entry in any kind of lexical resource, such as a dictionary or lexicon.
- <superEntry>: groups a sequence of entries within any kind of lexical resource, such as a dictionary or lexicon which function as a single unit, for example a set of homographs.
- <re>: (related entry) contains a dictionary entry for a lexical item related to the headword, such as a compound phrase or derived form, embedded inside a larger entry.
- <hom>: (homograph) groups information relating to one homograph within an entry
These five elements can be used to distinguish different types of entries along two conceptual axes:
- Structured vs. unstructured entries, i. e. entries that can readily be represented (in the lexical view) in the spirit of the TEI Guideline’s Dictionary Chapter (<entry>, <re>) vs. entries that for some reason violate the generic content model of <entry> or <re> and thus have to be represented more freely (<entryFree>). A third category in this respect are entries that exhibit a highly reduced amount of lexical content while this content is still of essentially entry-like nature (<superEntry>).
- Containing vs. contained entries: entries may contain additional lexical information that can be conceived as an additional dictionary entry in its own right. Specifically, <superEntry> may contain <entry>, and <entry> in turn may contain <re> to represent the embedding of lexical entries on three distinct levels. Due to <re> being allowed to be used recursively, the number of levels for representing entry-like lexical information inside other such blocks is effectively unrestricted. At the same time, two different mechanism can be used to create homographic entries: <superEntry> containing multiple <entry> elements; or <entry> containing multiple <hom> elements.
3.4.1. hom
Making a clear difference between a situation where an entry has to be split into two or more homonyms and one where these differences correspond to a semantic alternation is lexicographically difficult. Still, the main danger in keeping both possibilities in the representation of a lexical entry in a digital lexicon is to introduce a systematic structural ambiguity as to where the appropriate information is to be found. We thus deprecate <hom> altogether in the present recommendation and have this element replaced by the nested <entry> construct.
For instance, the following example from the TEI Guidelines:
<entry>
<form>
<orth>bray</orth>
<pron>breI</pron>
</form>
<hom>
<gramGrp>
<gram type="pos">n</gram>
</gramGrp>
<sense>
<def>cry of an ass; sound of a trumpet.</def>
</sense>
</hom>
<hom>
<gramGrp>
<gram type="pos">vt</gram>
<subc>VP2A</subc>
</gramGrp>
<sense>
<def>make a cry or sound of this kind.</def>
</sense>
</hom>
</entry>
would in TEI Lex-0 be represented as:
<entry type="mainEntry" xml:id="bray" xml:lang="en"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>bray</orth>
<pron>brel</pron>
</form>
<entry xml:id="bray_n" xml:lang="en" type="homonymicEntry">
<gramGrp>
<gram type="pos">n</gram>
</gramGrp>
<sense xml:id="bray_n.1">
<def>cry of an ass</def>
</sense>
<pc>;</pc>
<sense xml:id="bray_n.2">
<def>sound of a trumpet</def>
</sense>
<pc>.</pc>
</entry>
<entry xml:id="bray_vt" xml:lang="en" type="homonymicEntry">
<gramGrp>
<gram type="pos">vt</gram>
<gram type="inflectionType">VP2A</gram>
</gramGrp>
<sense xml:id="bray_vt.1">
<def>make a cry or sound of this kind</def>
</sense>
<pc>.</pc>
</entry>
</entry>
In a similar fashion, consider this entry from the Dictionary of the Portuguese Language by Morais:
<entry xml:id="MORAIS.1.DLP.JANTAR" type="mainEntry" xml:lang="pt"
xml:base="../TEILex0.examples/examples.stripped.xml">
<entry xml:id="MORAIS.1.DLP.JANTAR-vt" type="homonymicEntry" xml:lang="pt">
<form type="lemma">
<orth>JANTAR</orth>
</form>
<metamark function="lemmaDelimiter">,</metamark>
<gramGrp>
<gram type="pos" norm="VERB">v.</gram>
<gram type="voice">at.</gram>
</gramGrp>
<sense xml:id="MORAIS.1.DLP.JANTAR.s.1">
<def>comer ao meio dia , ou comer depois de almoçar.</def>
</sense>
</entry>
<entry xml:id="MORAIS.1.DLP.JANTAR-n" type="homonymicEntry" xml:lang="pt">
<form type="lemma">
<orth>JANTAR</orth>
</form>
<metamark function="lemmaDelimiter">,</metamark>
<gramGrp>
<gram type="pos" norm="NOUN">ſ.</gram>
<gram type="gen">m.</gram>
</gramGrp>
<sense xml:id="MORAIS.1.DLP.JANTAR.s.2">
<def>a ſegunda das tres comidas regulares do dia, entre o almoço , e aceia , ou antes da merenda.</def>
</sense>
<pc>.</pc>
<metamark function="senseDelimiter">§</metamark>
<sense xml:id="MORAIS.1.DLP.JANTAR.s.3">
<def>Porção de dinheiro , que as Villas , e Cidades davão aos Reis , quando hião de correição para ſuſtento de ſua comitiva</def>
</sense>
<pc>.</pc>
<bibl type="attestation" source="#M._L._Monarchia_Luſitana">
<title>M. Luſ.</title>
<citedRange unit="volume">t. 5</citedRange>
<citedRange unit="folium">f. 53</citedRange>
<citedRange unit="chapter">cap. 27</citedRange>
</bibl>
</entry>
</entry>Silva (1789)
3.4.2. superEntry
By making <entry> recursive, TEI Lex-0 has eliminated the need for grouping entries with <superEntry>.
This is especially important for traditional root-based dictionaries, which start with the root as the main headword, followed by full-fledged lexicographic entries of derived headwords.
<entry type="wordFamily" xml:lang="ar" xml:id="syj"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="root">
<orth>سيج</orth>
</form>
<pc>:</pc>
<!-- To fence (verb) -->
<entry type="mainEntry" xml:lang="ar" xml:id="syj1">
<form type="lemma">
<orth>سيّج</orth>
</form>
<sense xml:id="syj1_sense1">
<cit type="example">
<quote>الكرم</quote>
</cit>
<pc>:</pc>
<def>جعل له سياجا</def>
</sense>
<pc>٠</pc>
</entry>
<!-- A fence (noun) -->
<entry type="mainEntry" xml:lang="ar" xml:id="syj2">
<form type="lemma">
<orth>السياج</orth>
</form>
<form type="inflected">
<gramGrp>
<gram type="number" value="plural">ج</gram>
</gramGrp>
<form type="variant">
<orth>سيَاجات</orth>
</form>
<lbl>و</lbl>
<form type="variant">
<orth>أسْوِجة</orth>
</form>
<lbl>و</lbl>
<form type="variant">
<orth>أَسْوِجة</orth>
</form>
<lbl>و</lbl>
<form type="variant">
<orth>سُوج</orth>
</form>
</form>
<pc>:</pc>
<sense xml:id="syj2_sense1">
<def>الحائط</def>
</sense>
<pc>||</pc>
<sense xml:id="syj2_sense2">
<def>ما أُحيط بهِ على شيءٍ كالكرم و النخل</def>
</sense>
</entry>
<pc>٠</pc>
<!-- A kind of fish -->
<entry type="mainEntry" xml:lang="ar" xml:id="syj3">
<form type="lemma">
<orth>السيْجان</orth>
</form>
<pc>(</pc>
<usg type="domain" value="animal">ح</usg>
<pc>)</pc>
<pc>:</pc>
<sense xml:id="syj3_sense1">
<def>نوع من السمك</def>
</sense>
</entry>
</entry>Almonjid (2014)
<entry type="wordFamily" xml:lang="ar" xml:id="shahama"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="root">
<orth>شهم</orth>
</form>
<pc>:</pc>
<entry type="wordfamily" xml:lang="ar" xml:id="shahama1">
<num>١ــ</num>
<entry type="mainEntry" xml:lang="ar" xml:id="shahama1_1">
<form type="lemma">
<orth>شَهَمَ</orth>
</form>
<form type="scheme">
<orth>ـَ</orth>
</form>
<form type="inflected">
<form type="variant">
<orth>شَهْمًا</orth>
</form>
<lbl>و</lbl>
<form type="variant">
<orth>شُهُمًا</orth>
</form>
</form>
<sense xml:id="shahama1_1_sense1">
<cit type="example">
<quote>الفرسَ</quote>
</cit>
<pc>:</pc>
<def>زجره</def>
</sense>
<pc>||</pc>
<lbl>و</lbl>
<sense xml:id="shahama1_1_sense2">
<cit type="example">
<quote>ــ الرجُل</quote>
</cit>
<pc>:</pc>
<def>افزعه</def>
</sense>
</entry>
<pc>٠</pc>
<entry type="mainEntry" xml:lang="ar" xml:id="shahama1_2">
<form type="lemma">
<orth>اَلمشْهوم</orth>
</form>
<pc>٠:</pc>
<sense xml:id="shahama1_2_sense1">
<def>المذعور</def>
</sense>
</entry>
</entry>
<entry type="wordFamily" xml:lang="ar" xml:id="shahama2">
<num>٢٠ ــ</num>
<entry type="mainEntry" xml:lang="ar" xml:id="shahama2_1">
<form type="lemma">
<orth>شَهُم</orth>
</form>
<form type="scheme">
<orth>ـُـ</orth>
</form>
<form type="inflected">
<form type="variant">
<orth>شَهَامةً</orth>
</form>
<lbl>و</lbl>
<form type="variant">
<orth>شُهُومَةُُ</orth>
</form>
</form>
<lbl>:</lbl>
<sense xml:id="shahama2_1_sense1">
<def> كان شهْمًا</def>
</sense>
</entry>
<pc>٠</pc>
<entry type="mainEntry" xml:lang="ar" xml:id="shahama2_2">
<form type="lemma">
<orth>الشَهْم</orth>
</form>
<form type="inflected">
<gramGrp>
<gram type="number" value="plural">ج</gram>
</gramGrp>
<orth>شِهام</orth>
</form>
<pc>:</pc>
<sense xml:id="shahama2_2_sense1">
<def>الذكيّ الفؤاد</def>
</sense>
<pc>||</pc>
<sense xml:id="shahama2_2_sense2">
<def>السيِّد النافذ الحكم</def>
</sense>
<pc>||</pc>
<sense xml:id="shahama2_2_sense3">
<lbl>وــ</lbl>
<form type="inflected">
<gramGrp>
<gram type="number" value="plural">ج</gram>
</gramGrp>
<orth>شُهُم</orth>
</form>
<pc>:</pc>
<def>الفرس النشيط السريع القويّ</def>
</sense>
</entry>
<pc>٠</pc>
<entry type="mainEntry" xml:lang="ar" xml:id="shahama2_3">
<form type="lemma">
<orth>اَلمَشْهُوم</orth>
</form>
<pc>*:</pc>
<sense xml:id="shahama2_3_sense1">
<def>الذكيّ الفؤاد</def>
</sense>
</entry>
</entry>
<entry type="wordFamily" xml:lang="ar" xml:id="shahama3">
<num>٠٣ ــ</num>
<entry type="mainEntry" xml:lang="ar" xml:id="shahama3_1">
<form type="lemma">
<orth>الشَيْهَم</orth>
</form>
<form type="inflected">
<gramGrp>
<gram type="number" value="plural">ج</gram>
</gramGrp>
<orth>شَيَهِم</orth>
</form>
<pc>(</pc>
<usg type="domain" value="animal">ح</usg>
<pc>)</pc>
<sense xml:id="shahama3_1_sense1">
<def>ذَكَر القنافذ</def>
</sense>
</entry>
<pc>٠</pc>
<entry type="mainEntry" xml:lang="ar" xml:id="shahama3_2">
<form type="lemma">
<orth>الشَيْهَمَة</orth>
</form>
<pc>:</pc>
<sense xml:id="shahama3_2_sense1">
<def>العجوز</def>
</sense>
</entry>
</entry>
</entry>Almonjid (2014)
See also Section on grammatical properties in senses.
4. Forms
The current TEI Guidelines allows for an extremely wide range of encoding possibilities for written and spoken forms. In the discussion which follows, we suggest ways in which the elements, in particular <form>, can be constrained. We give examples of use types not covered by the Guidelines, and propose some extensions.
4.1. A note on inheritance
We assume that in order to determine the complete properties of an element inside the entry tree, the principle of default inheritance applies, e.g. grammatical properties of a form are determined by collecting the sibling <gramGrp> of the ancestor-or-self of the focus element, where the superordinate grammatical properties can be overwritten by the lower-level properties. This principle is relatively straightforward in the case of grammatical properties, but more complex for the word paradigm, esp. in cases of variant forms. For more information c.f. Ide et al. (2000) and Erjavec et al. (2000).
4.2. Lemmas
The form element should always be qualified by its type. The lemma (i.e. headword) form should be encoded as form[@type="lemma"]
.
If it is necessary to specify the grammatical properties of the lemma form itself (as opposed to the grammatical properties of the entry), this is described by entry/form[@type="lemma"]/gramGrp
.
4.3. Inflected forms
Dictionaries often include additional forms next to the lemma. In English, these are used to specify irregular forms, such as “corpus / corpora” or “take / took”, whereas in inflectionally rich languages they are often used to help the user determine the correct paradigm of the word.
Such inflected forms should be encoded in entry/form[@type="inflected"]
, e.g.:
<entry xml:lang="en" xml:id="CH.go1"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>go</orth>
<pron>gō</pron>
</form>
<lbl rend="sup">1</lbl>
<gramGrp>
<gram type="pos">vi</gram>
</gramGrp>
<pc>(</pc>
<form type="inflected">
<gramGrp>
<gram type="participle">prp</gram>
</gramGrp>
<orth>gō'ing</orth>
</form>
<pc>;</pc>
<form type="inflected">
<gramGrp>
<gram type="participle">pap</gram>
</gramGrp>
<orth>gone</orth>
<pron>gon</pron>
<note>(see separate entries)</note>
</form>
<pc>;</pc>
<form type="inflected">
<gramGrp>
<gram type="participle">pat</gram>
</gramGrp>
<orth>went</orth>
<note>(supplied from <xr type="related">
<ref type="entry">wend</ref>
</xr>)</note>
</form>
<pc>;</pc>
<form type="inflected">
<gramGrp>
<gram type="person">3rd</gram>
<gram type="tense">pers</gram>
<gram type="number">sing</gram>
<gram type="tense">pres</gram>
<gram type="mood">indicative</gram>
</gramGrp>
<orth>goes</orth>
</form>
<pc>;</pc>
<!--...-->
</entry>Chambers (2011)
Or take this example: abeceda, -y: in Czech, "-y" is a genitive singular suffix for feminine nouns. We can mark-up the grammatical properties of the suffix, while providing the full form of the noun as well:
<entry type="mainEntry" xml:lang="cz" xml:id="en000008"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma" xml:id="en000008.hw1">
<orth>abeceda</orth>
</form>
<pc>,</pc>
<form type="inflected">
<gramGrp>
<gram type="case" value="genitiv"/>
<gram type="number" value="singular"/>
<gram type="gender" value="feminine"/>
</gramGrp>
<orth extent="suffix" expand="abecedy">-y</orth>
</form>
<!--...-->
</entry>
4.4. Paradigms
When several inflected forms can be present next to the lemma, these can be embedded into entry/form[@type="paradigm"]
. The decision on whether to use this extra element depends on the particular dictionary and language.
The other use case for paradigms is when the full inflectional paradigm of the word is embedded in the entry, i.e. when the dictionary also includes all the word-forms of the words covered, which can be useful for example in machine processing.
An entry may contain several paradigms, e.g. a partial one for humans and a full one for machines, or one for each stem of a verb. Each paradigm type should be distinguished by the subtype attribute.
<entry xml:id="perder" xml:lang="es"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>perder</orth>
</form>
<gramGrp>
<gram type="pos">verb</gram>
</gramGrp>
<form type="paradigm" subtype="present">
<form type="inflected">
<orth>pierdo</orth>
<gramGrp>
<gram type="person">1</gram>
<gram type="number">sg</gram>
<gram type="mood">indic</gram>
<gram type="voice">active</gram>
</gramGrp>
</form>
<!-- other inflected forms (of present indicative) here -->
<gramGrp>
<gram type="tns">present</gram>
</gramGrp>
</form>
<form type="paradigm" subtype="preteritum">
<form type="inflected">
<orth>perdí</orth>
<gramGrp>
<gram type="person">1</gram>
<gram type="number">sg</gram>
<gram type="mood">indic</gram>
<gram type="voice">active</gram>
</gramGrp>
</form>
<gramGrp>
<gram type="tense">preteritum</gram>
</gramGrp>
</form>
<!--... -->
</entry>
4.5. Variants
The representation of variation within a form is highly dependant upon the specifics of the features of the variation and the way in which they vary. However, as a general principle, variation may be encoded as form[@type="variant"]
and embedded within the parent element for which a subordinate feature exhibits variation.
4.5.1. Orthographic variation
Several kinds of orthographic variation may be distinguished. Below, we present some of the options with the corresponding examples.
Spelling variation due to change in language’s orthography convention:
<entry xml:id="Flussschifffahrt" xml:lang="de" type="compound"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth type="segmented">
<seg>Fluss</seg>
<seg>schifffahrt</seg>
</orth>
<form type="variant">
<orth>
<seg>Fluss</seg>
<pc>-</pc>
<seg>Schifffahrt</seg>
</orth>
</form>
<form type="variant">
<orth notAfter="1996">
<seg>Fluß</seg>
<seg>schiffahrt</seg>
</orth>
<usg type="temporal">Vor 1996 Rechtschreibung Reform</usg>
</form>
<gramGrp>
<gram type="pos">noun</gram>
</gramGrp>
</form>
<!--...-->
</entry>
The following example is from American English in which due to the lack of official conventions for transliteration of Arabic orthography to the English (Latin) script, the initial vowel in the surname ‘Osama Bin Laden’ varies between ‘O’ and ‘U’:
<entry xml:id="Osama" xml:lang="en"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<pron notation="ipa">
<seg xml:id="ousma" corresp="#usma #osma">ow."sa.ma</seg>
<seg>bɪn</seg>
<seg>ˈlaːdn̹</seg>
</pron>
<form type="variant">
<orth type="transliterated">
<seg xml:id="osma" corresp="#usma #ousma">Osama</seg>
<seg>Bin</seg>
<seg>Laden</seg>
</orth>
</form>
<form type="variant">
<orth type="transliterated">
<seg xml:id="usma" corresp="#osma #ousma">Usama</seg>
<seg>Bin</seg>
<seg>Laden</seg>
</orth>
</form>
</form>
<!--...-->
</entry>
4.5.2. Phonetic variation
In this example, the entry contains the single orthographic form as a direct child of the lemma and phonetic transcriptions of the two roughly equally used variant pronunciations of the word 'caramel' from American English.
<entry xml:id="caramel-en" xml:lang="en-US"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>caramel</orth>
<form type="variant">
<pron notation="ipa">'keɹə"mɛl</pron>
</form>
<form type="variant">
<pron notation="ipa">'kaɹmɫ̩</pron>
</form>
</form>
<gramGrp>
<gram type="pos">noun</gram>
</gramGrp>
<!-- ... -->
</entry>
In the example above, one could have chosen to mark up two different pronunciations using two <pron> elements inside the form[@type="lemma"]
. Considering, however, that each individual pronunciation could, in theory, be further qualified, for instance, by a <usg> note, indicating the geographic area in which the said pronunciation is used, TEI Lex-0 recommends that multiple variants, whether orthographic or orthoepic, be contained each in its own <form> element.
4.5.3. Regional or dialectal variation
In the following example from Mixtepec-Mixtec, there is variation in the form of the word for the city of Oaxaca between speakers from the village of Yucanany and the rest of the speakers. Since the Yucanany variety makes up only a small portion of the speakers of the language, this case of variation is represented as an embedded form[@type="variant"]
within the lemma. Note the use of usg[@type="geographic"]/placeName
to explicitly specify this feature in addition to the use of the private language subtag (@xml:lang="mix-x-YCNY"
) as per BCP 47.
<entry xml:id="Oaxaca-MIX" xml:lang="mix" type="compound"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>
<seg>Ñuu</seg>
<seg>Ntua</seg>
</orth>
<pron notation="ipa">
<seg>ɲùù</seg>
<seg>nd̪ùá</seg>
</pron>
<form type="variant" xml:lang="mix-x-YCNY">
<orth>Ntua</orth>
<pron notation="ipa">nd̪ùá</pron>
<usg type="geographic"> Yucanany
</usg>
</form>
</form>
<gramGrp>
<gram type="pos">locationNoun</gram>
</gramGrp>
<!--...-->
</entry>
4.6. Multiword expressions
The Dictionary Chapter of the TEI Guidelines is very sparse when it comes to recommendations for encoding polylexical units. The only mention of the adjective “multi-word” appears in the definition of the element <term>: “contains a single-word, multi-word, or symbolic designation which is regarded as a technical term” but this is not relevant for the encoding of polylexical units in general-purpose dictionaries.
TEI includes an element <colloc> (collocate), which is defined as containing “any sequence of words that co-occur with the headword with significant frequency” but, in a different example, “colloc” is used as an attribute value for the element <usg> (usage). It is precisely this type of ambiguity that TEI Lex-0 is trying to resolve.
The TEI Guidelines recommend the use of <re> (related entry) to encode “related entries for direct derivatives or inflected forms of the entry word, or for compound words, phrases, collocations, and idioms containing the entry word” with barely any useful examples, or discussion of how to encode different types of polylexical units. TEI Lex-0, on the other hand, does not include <re>. In TEI Lex-0, <entry> was made recursive in order to account for nestable entry-like structures without the need to resort to <re>, a differently named element whose content model would be indistinguishable from <entry> itself. Eventually, the new content model of <entry>, which allows nesting, was adopted by TEI itself (Tasovac 2020).
TODO: explain different types of mwe's from a dict. model perspective referring to Tasovac 2020)
4.6.1. Collocations
TODO: explain "lexicographically transparent"
<entry xml:id="DLPC.descalçar" xml:lang="pt"
xml:base="../TEILex0.examples/examples.stripped.xml">
<!--etc.-->
<sense xml:id="DLPC.descalçar.1">
<!--etc.-->
<form type="collocations">
<form type="collocation">
<orth>
<ref type="form" scope="currentEntry" value="descalçar">
<lbl>+</lbl>
</ref>
<seg>as botas</seg>
</orth>
<gramGrp>
<gram type="mwe" value="co-ocorrente_privilegiado"/>
</gramGrp>
</form>
<pc>,</pc>
<form type="collocation">
<orth>
<ref type="form" scope="currentEntry" value="descalçar"/>
<seg>as luvas</seg>
</orth>
<gramGrp>
<gram type="mwe" value="co-ocorrente_privilegiado"/>
</gramGrp>
</form>
<pc>,</pc>
<form type="collocation">
<orth>
<ref type="form" scope="currentEntry" value="descalçar"/>
<seg>as meias</seg>
</orth>
<gramGrp>
<gram type="mwe" value="co-ocorrente_privilegiado"/>
</gramGrp>
</form>
</form>
<pc>;</pc>
<form type="collocations">
<form type="collocation">
<orth>
<ref type="form" scope="currentEntry" value="descalçar">
<lbl>+</lbl>
</ref>
<seg>os sapatos</seg>
</orth>
<gramGrp>
<gram type="mwe" value="co-ocorrente_privilegiado"/>
</gramGrp>
</form>
</form>
<pc>.</pc>
</sense>
</entry>DLPC (2001)
4.6.2. Idiomatic expressions
TODO text ("lexicographically non-transparent")
<entry xml:lang="pt" xml:id="DLPC.bombeiro" type="mainEntry"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>bombeiro</orth>
</form>
<!--etc. -->
<sense xml:id="bombeiro.1">
<!--etc. -->
<entry xml:id="DLPC.bombeiro_voluntario" xml:lang="pt" type="relatedEntry">
<form type="lemma">
<orth>bombeiro voluntário</orth>
</form>
<gramGrp>
<gram type="mwe" value="combinatória_fixa"/>
</gramGrp>
<pc>,</pc>
<sense xml:id="DLPC.bombeiro_voluntario.1">
<def>o que pertence a uma corporação com a obrigatoriedade de acudir a
incêndios, acidentes, unicamente por filantropia</def>
<pc>.</pc>
</sense>
</entry>
<entry xml:id="DLPC.corpo_de_bombeiros" xml:lang="pt" type="relatedEntry">
<form type="lemma">
<orth>
<ref type="entry" scope="currentEntry">
<seg>corpo</seg>
<lbl rend="sup">+</lbl>
</ref>
<seg>de bombeiros</seg>
</orth>
</form>
<pc>.</pc>
</entry>
</sense>
<!--etc.-->
</entry>DLPC (2001)
5. Senses
5.1. General remarks
In the current TEI Dictionary Chapter, the content model of <entry> allows one to have sense-related information directly within <entry>. TEI Lex-0 proscribes a stricter use of these elements so that sense-related information is grouped within the <sense> element, in accordance with the underlying semasiological model implemented in the TEI Guidelines.
<sense> should be therefore considered mandatory for any dictionary entry that actually provides sense information for the headword. Further in this document, we consider some additional specific cases e.g. “referencing” entries (entries that simply point to other entries) and inflectional lexica (dictionaries that describe word forms only), where <sense> is not a mandatory child of <entry>.
As a consequence of making the use of <sense> more systematic within <entry>, we have seen (see section on <entry>) that some elements are no longer allowed as children of <entry>. We provide here a specific background for each of them:
- <def> is clearly intended to provide a prose description of a meaning within a <sense> element and should not appear in any other context;
- In the same way, it is recommended that <cit> be used exclusively as a child of <sense>, or when necessary within <dictScrap>;
- The case of <hom> is peculiar since it provides a subordinate organization to an entry which is redundant in relation to what <sense> allows one to represent. <hom> is not allowed in TEI Lex-0.
Note: In the case one has to deal with information that does not fit a <sense>-based organization, for instance in the process of retro-digitizing an existing dictionary source, the use of <dictScrap> is recommended. Further step in the encoding of the lexical content may lead to a more precise encoding in a second phase.
In TEI Lex-0, <sense> has a mandatory xml:id.
5.2. Limiting contexts for def
In the current TEI Guidelines, <def> is allowed within the following elements:
- Module core: <cit>
- Module dictionaries: <dictScrap>, <entry>, <entryFree>, <etym>, <hom>, <re>, <sense>
- Module namesdates: <nym>
TEI Lex-0 allows the use of <def> in <sense> only. All other existing contexts would be implemented by embedding <def> within a <sense>.
5.3. Glosses
5.3.1. Gloss vs. definition?
In the lexicographic literature, gloss is a rather amorphous category. Zgusta, in his classic Manual of Lexicography (1971), defines it as "any descriptive or explanatory note within the entry" which includes "short comments, explanatory remarks, semantic characteristics or qualifications" (270). Atkins and Rundell (2008) see the gloss as "a more informal explanation of the meaning of a multiword expression or example (or even part of one) in the entry,[...] chiefly used in monolingual dictionaries for learners, to help understanding" (209). While one could argue about the statement that this type of lexicographic construct is used "chiefly... in monolingual dictionaries for learners", it is certainly the case that glosses are expected to help users better understand or more easily locate the particular meaning of a word that they are looking up.
- fugitive (of persons)
- fugitive (verses)
<entry xml:id="ED.fugitive" xml:lang="en">
<form type="lemma">
<orth>fugitive</orth>
</form>
<sense n="1">
<gloss>(of persons)</gloss>
</sense>
<sense n="2">
<gloss>(verses)</gloss>
</sense>
</entry>
<entry xml:id="ED.fugitive" xml:lang="en">
<form type="lemma">
<orth>fugitive</orth>
</form>
<sense n="1">
<gloss>(of persons)</gloss>
<def>given to, or in the act of, running away from a place, especially to avoid arrest or persecution.</def>
</sense>
<sense n="2">
<gloss>(verses)</gloss>
<def>concerned or dealing with subjects of passing interest; ephemeral, occasional.</def>
</sense>
</entry>
On sense-distinguishing grammatical properties, see section Grammatical properties in senses
5.3.2. Glossing examples
Semantic glosses can occur at different levels of the entry hierarchy. In the previous section, we saw examples in which glosses were used as a kind of semantic shorthand for an individual sense. They can, however, be used to further qualify individual examples in the entry. Take, for instance, this entry from the Longman Dictionary of Contemporary English (2003):
living /... / adj 1 alive now [...] | The sun affects all living things (=people, animals, and plants). | A living language (=one that people still use) [….]
In TEI Lex-0, this entry would be represented as:
<entry xml:id="LDOCE.living" xml:lang="en" type="mainEntry"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>living</orth>
</form>
<gramGrp>
<gram type="pos">adj</gram>
</gramGrp>
<sense n="1" xml:id="LDOCE.living.1">
<num>1</num>
<def>alive now
<!--[...] -->
</def>
<metamark>|</metamark>
<cit type="example">
<quote>The sun affects all <ref type="entry" scope="currentEntry">living</ref>
things <gloss>(=people, animals, and plants)</gloss>.</quote>
</cit>
<metamark>|</metamark>
<cit type="example">
<quote>A <ref type="entry" scope="currentEntry">living</ref> language <gloss>(=one
that people still use)</gloss>
<!--[….] -->
</quote>
</cit>
</sense>
</entry>Gadsby (ed.) (2003)
5.4. Grammatical properties
In some dictionaries, individual dictionary senses may be associated with grammatical properties, such as part of speech or gender, that differ from the rest of the entry: for instance, a particular sense of a countable noun may be used only in plural. In such cases, <gramGrp> will be naturally placed inside the given <sense>:
Consider, for instance, the second sense of this entry:
<sense xml:id="DLPC.antepassado_b_2" n="2"
xml:base="../TEILex0.examples/examples.stripped.xml" xml:lang="pt">
<gramGrp>
<gram type="number">pl.</gram>
</gramGrp>
<def>Pessoas anteriormente ao momento actual.</def>
<xr type="synonymy">
<ref type="sense">antecessores</ref>
</xr>
<xr type="antonymy">
<ref type="sense">vindouros</ref>
</xr>
<cit type="example">
<quote>Hérdamos estes costumes dos nossos antepassados.</quote>
</cit>
<cit type="example">
<quote>Culto dos antepassados.</quote>
</cit>
</sense>DLPC (2001)
5.4.1. Grammatical glosses?
Zgusta also uses "gloss" to describe "grammatical indications in the broadest sense of the word" (1971, 240), using an example familiar from Latin (and many other) dictionaries:
- petere aliquid ab aliquo [to ask for something from somebody]
- petere Romam [to rush to Rome]
In theory, one could choose to encode such phenomena using <gloss>, but TEI Lex-0 recommends a clear separation of roles: <gloss> should be used for semantic or pragmatic information, whereas grammatical information should be encoded using the familiar gramGrp/gram
constructs:
<sense n="1" xml:id="LD.peto.1">
<gramGrp>
<gram type="rection">aliquid ab aliquo</gram>
</gramGrp>
</sense>
<sense n="1" xml:id="LD.peto.2">
<gramGrp>
<gram type="rection">Romam</gram>
</gramGrp>
</sense>
Here, too, it is important to note the possibility of ambiguity: unlike "petere aliquid ab aliquo", "petere Romam" could be interpreted as an example. The decision on such ambiguous cases should never be taken in isolation: editors of a digital edition need to consider the conventions of the dictionary as a whole before advising encoders on how to mark up such ambiguous cases.
5.4.2. Nested entries vs. multiple senses
While TEI Lex-0 has been created to simplify the choices available for encoding various lexicographic components, certain levels of ambiguity remain, often due to the highly condensed nature of dictionary content.
Consider, for instance, this entry:
Is this an entry with two senses? Or are these two entries that were on the account of typographic density merged into one?
The answer is as much in the eyes of the beholder, as it is in the eyes of the lexicographers behind the dictionary that the entry stems from, in this case The Chambers Dictionary. Both the encoder and lexicographers, however, are influenced by lexicographic and linguistic traditions in which they operate. For an overview of the homonymy-polysemy dilemma, see, for instance, Zöfgen 1989.
It can't be stressed enough that the goal of dictionary encoding is not to resolve linguistic disputes or evaluate lexicographic traditions but rather to create consistent, if abstracted, representations of lexicographic architectures.
So, what can we do in this particular case? Should we encode gash as an entry consisting of senses, each with a different part of speech, like this:
<entry xml:id="CHDOEL.gash2" xml:lang="en"
xml:base="../TEILex0.examples/examples.stripped.xml">
<!--this, as we'll explain later, is valid but not the preferred encoding-->
<form type="lemma">
<orth>gash</orth>
<pron>gash</pron>
</form>
<lbl type="homNum" rend="sup">2</lbl>
<sense xml:id="CHDOEL.gash2.1">
<pc>(</pc>
<usg type="socioCultural" expand="slang">sl</usg>
<pc>)</pc>
<gramGrp>
<gram type="pos">adj</gram>
</gramGrp>
<def>spare, extra</def>
<pc>.</pc>
</sense>
<metamark function="senseSeparator">◆</metamark>
<sense xml:id="CDHDOEL.gash2.2">
<gramGrp>
<gram type="pos">n</gram>
</gramGrp>
<pc>(</pc>
<usg type="temporal" expand="originally">orig</usg>
<lbl>and esp</lbl>
<usg type="domain" expand="nautical">naut</usg>
<pc>)</pc>
<def>rubbish, waste</def>
<pc>.</pc>
</sense>
</entry>
This is surely valid TEI Lex-0. There is conceptually nothing wrong with this encoding: it adequately represents the structure implied by the source text.
We should, however, try to look at the issue at hand from a broader, comparative, perspective.
- In the Portuguese polysemous entry antepassado above, we had a case in which one particular sense (used in plural only) deviated from the other senses (which are used in both singular and plural). Since the senses were numbered in the original, there was never any doubt about how we would encode this. It was clear from the outset:
- that the semantic information in that entry was grouped by a construct called <sense>;
- that senses inherited grammatical properties from the entry as a whole (i.e.
entry/gramGrp
); - that, implicitly, we could assume that each sense can be used with the noun in both singular and plural; and
- that the plural-only sense was grammatically exceptional, hence
entry/sense/gramGrp/
).
- The English example is different: gash as a verb and as a noun are grammatical homonyms. If we encode them, as we did above, as two senses within one entry, we end up with an entry in which there is no inheritance (of grammatical properties) and only exceptions (at each sense-level).
Because TEI Lex-0 is aimed at creating a baseline encoding to facilitate data exchange and comparison between different dictionaries, we, therefore, recommend to encode grammatical homonyms in TEI Lex-0 as nested entries and to use <gramGrp> in <sense> constructs to mark up sense-specific deviations from the rule of grammatical inheritance.
For that reason, our preferred encoding of gash as a verb and a noun would be:
<entry xml:id="CH.gash2" xml:lang="en"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>gash</orth>
<pron>gash</pron>
</form>
<lbl type="homNum" rend="sup">2</lbl>
<entry xml:id="CH.gash2.1" xml:lang="en" type="homonymicEntry">
<sense xml:id="CH.gash2.1.1">
<pc>(</pc>
<usg type="socioCultural" expand="slang">sl</usg>
<pc>)</pc>
<gramGrp>
<gram type="pos">adj</gram>
</gramGrp>
<def>spare, extra</def>
<pc>.</pc>
</sense>
</entry>
<metamark function="entrySeparator">◆</metamark>
<entry xml:id="CH.gash2.2" xml:lang="en" type="homonymicEntry">
<gramGrp>
<gram type="pos">n</gram>
</gramGrp>
<sense xml:id="CH.gahs2.2.1">
<pc>(</pc>
<usg type="temporal" expand="originally">orig</usg>
<lbl>and esp</lbl>
<usg type="domain" expand="nautical">naut</usg>
<pc>)</pc>
<def>rubbish, waste</def>
<pc>.</pc>
</sense>
</entry>
</entry>
For an example in which grammatical homonyms have themselves multiple senses, one of which is grammatically constrained, see, for instance:
<entry xml:id="ED.aid" xml:lang="en"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>aid</orth>
<pron>/ed/</pron>
</form>
<entry xml:id="ED.aid_n" xml:lang="en" type="homonymicEntry">
<gramGrp>
<gram type="pos">noun</gram>
</gramGrp>
<sense xml:id="ED.aid_n.1" n="1">
<num>1.</num>
<gramGrp>
<gram type="number" value="singularia tantum"/>
</gramGrp>
<def>help, especially money, food or other gifts given to people living in
difficult conditions</def>
<metamark function="exampleMarker">○</metamark>
<cit type="example">
<quote>aid to the earth-quake zone</quote>
</cit>
<cit type="example">
<quote>an aid worker</quote>
</cit>
<note>(NOTE: This meaning of aid has no plural.)</note>
<metamark function="relatedEntryMarker">○</metamark>
<entry type="relatedEntry" xml:id="ED.aid_n.1.in_aid_of" xml:lang="en">
<form type="lemma">
<orth>in aid of</orth>
</form>
<sense xml:id="ED.aid_n.1.in_aid_of.1">
<def>in order to help</def>
<metamark function="exampleMarker">○</metamark>
<cit type="example">
<quote>We give money in aid of the Red Cross.</quote>
</cit>
<metamark function="exampleMarker">○</metamark>
<cit type="example">
<quote>They are collecting money in aid of refugees.</quote>
</cit>
</sense>
</entry>
</sense>
<sense xml:id="ED.aid_n.2" n="2">
<num>2.</num>
<def>thing which helps you to do something</def>
<metamark function="exampleMarker">○</metamark>
<cit type="example">
<quote>kitchen aids</quote>
</cit>
</sense>
</entry>
<metamark function="subentryMarker">■</metamark>
<entry xml:id="ED.aid_v" xml:lang="en" type="homonymicEntry">
<gramGrp>
<gram type="pos">verb</gram>
</gramGrp>
<sense xml:id="ED.aid.v.1" n="1">
<num>1.</num>
<def>to help something to happen</def>
</sense>
<sense xml:id="ED.aid.v.2" n="2">
<num>2.</num>
<def>to help someone</def>
</sense>
</entry>
</entry>
6. Translations
6.1. Translation equivalents
TEI Guidelines:
<entry>
<form>
<orth>horrifier</orth>
</form>
<gramGrp>
<gram type="pos">v</gram>
</gramGrp>
<cit type="translation" xml:lang="en">
<quote>to horrify</quote>
</cit>
<cit type="example">
<quote>elle était horrifiée par la dépense</quote>
<cit type="translation" xml:lang="en">
<quote>she was horrified at the expense.</quote>
</cit>
</cit>
</entry>
TEI Lex-0:
<entry xml:id="horrifier" type="mainEntry" xml:lang="fr"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>horrifier</orth>
</form>
<gramGrp>
<gram type="pos">v</gram>
</gramGrp>
<sense xml:id="horrifier.1">
<cit type="translationEquivalent" xml:lang="en">
<form>
<orth>horrify</orth>
</form>
</cit>
<cit type="example">
<quote>elle était horrifiée par la dépense</quote>
<cit type="translation" xml:lang="en">
<quote>she was horrified at the expense</quote>
</cit>
</cit>
</sense>
</entry>
<entry type="mainEntry" xml:lang="en" xml:id="aid"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>Aid</orth>
</form>
<pc>,</pc>
<sense xml:id="aid.1">
<gramGrp>
<gram type="pos">v.a.</gram>
</gramGrp>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>aider</orth>
</form>
</cit>
<pc>,</pc>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>assister</orth>
</form>
</cit>
<pc>,</pc>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>secourir</orth>
</form>
</cit>
</sense>
<pc>;</pc>
<sense xml:id="aid.2">
<gramGrp>
<gram type="pos">s.</gram>
</gramGrp>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>aide</orth>
</form>
</cit>
<pc>,</pc>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>assistance</orth>
<pc>,</pc>
<gramGrp>
<gram type="gen">f.</gram>
</gramGrp>
</form>
</cit>
<pc>,</pc>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>secours</orth>
<pc>,</pc>
<gramGrp>
<gram type="gen">m.</gram>
</gramGrp>
</form>
</cit>
</sense>
<pc>;</pc>
<sense xml:id="aid.3">
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>sub-side</orth>
</form>
<pc>,</pc>
<gramGrp>
<gram type="gender">m.</gram>
</gramGrp>
</cit>
</sense>
<pc>;</pc>
<sense xml:id="aid.4">
<gloss>(pers)</gloss>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>aide</orth>
</form>
<pc>,</pc>
<gramGrp>
<gram type="gen">m.</gram>
<gram type="gen">f.</gram>
</gramGrp>
</cit>
</sense>
<entry type="relatedEntry" xml:lang="en" xml:id="by_the_aid_of">
<form type="lemma">
<orth>By the <ref type="oRef">_</ref> of</orth>
</form>
<pc>,</pc>
<sense xml:id="by_the_aid_of.1">
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>à l'aide de</orth>
</form>
</cit>
</sense>
</entry>
<pc>.</pc>
<entry type="relatedEntry" xml:lang="en" xml:id="in_aid_of">
<form>
<orth>In <ref type="oRef">_</ref> of</orth>
</form>
<pc>,</pc>
<sense xml:id="in_aid_of.1">
<gloss>(of performances)</gloss>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>au profit de</orth>
</form>
</cit>
<pc>,</pc>
<cit type="translationEquivalent">
<form>
<orth>au bénéfice de</orth>
</form>
</cit>
</sense>
</entry>
<pc>.</pc>
<entry type="derived" xml:lang="en" xml:id="aidless">
<form type="lemma">
<orth>_less</orth>
<pc>,</pc>
<gramGrp>
<gram type="pos">adj.</gram>
</gramGrp>
</form>
<sense xml:id="aidless.1">
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>sans aide</orth>
</form>
</cit>
<pc>,</pc>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>sans secours</orth>
</form>
</cit>
</sense>
<pc>;</pc>
<sense xml:id="aidless.2">
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>abandonné</orth>
</form>
</cit>
<pc>,</pc>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>délaissé</orth>
</form>
</cit>
</sense>
</entry>
</entry>
7. Cross-references
7.1. General remarks
The current TEI Guidelines provide several mechanisms by means of which one item of lexical information can refer to another, e.g.:
- <gloss> for the provision of simple (non refined) translation equivalents of the head word
<usg type="synonym"/>
for synonym references<cit type="translation"><quote><!--...--></quote></cit>
for translation equivalents in bilingual or translation dictionaries- <oRef> and <pRef> for the resolution of “~" headword placeholders in quotations and other dictionary text
- <xr> and <ref> as a general cross-referencing mechanism
<ptr/>
as a pointer to another location<link/>
element<mentioned/>
in the etymology section<term/>
for mentions of technical terms
In keeping with the approach of the TEI Lex-0, and considering that links/relations between lexical data elements are an essential part of the core lexical data model rather than mere convenience pointers for dictionary users, we need a more unified and more constrained mechanism for lexical references, whether they point to an existing lexical entity in some dictionary or lexicon, or in a more general way to lexical objects without a target reference.
The proposed mechanism has the following properties
- It applies only to references with a clear linguistic meaning.
- The number of arbitrary (or context-dependent) choices for the encoder is minimal; the semantics of the reference should not depend on context
- The relation between representing dictionary content and the underlying/implied lexical data model should be as transparent as possible
- No drastic changes to the TEI Guidelines are needed.
In the following section, we first present the recommended encoding, and then elicit how existing alternatives can be replaced accordingly.
7.2. xr vs. ref
In TEI Lex-0, we use <ref> as the general element for a lexical reference and <xr> as the enclosing element that groups all information related to this reference, including explicit labels such as "Syn.", "Cf.", "See also" etc. The reference may be internal to a dictionary or pointing to an external source, even when the actual target lexical object is not explicitly known. In the latter case, <ref> can be used without an explicit pointing attribute. Furthermore, the intended target of the reference can be a full entry, but, sometimes, also a specific sense.
For all such uses, the following attributes may be used on <xr> and <ref>:
- type is a mandatory attribute on <xr> for a lexical reference. Its default value is "related". This attribute can be used to indicate the lexical relation between the headword of the entry and the object referred to (see next section)
ref/@type
is required; it indicates the target object category (entry, sense); the type attribute on <ref> is also needed to distinguish lexicographic from bibliographic references..- xml:lang on <xr> is required when <ref> contains an explicit lexical form in a language which is different from the source language
ref/@target
to point to the URI of a lexical object. The value of this attribute is a machine-readable link to your cross-reference.ref/@notation
indicates, like we currently do on <orth> or <pron>, the notation used for the explicit lexical form, where applicable
Explicit dictionary labels which indicate the type of relationship between the current lexical item and the cross-reference should be encoded as <lbl> inside of <xr>.
7.2.1. Values of ref/@target
- If the reference has no explicit target, no target is used.
- As per TEI pointing mechanisms, the value of target must be an URI reference.
- For internal references (references to the same dictionary), TEI Lex-0 enforces the use of explicit pointers to the xml:id of an element being pointed to, preceded by
#
. See Section "Pointing Locally" in the TEI Guidelines. - TEI pointers should not be used in TEI Lex-0.
7.3. Cross-reference typology
7.3.1. Related
The default reference to another lexical unit when no more granular information about the type of relationship is available.
In TEI Lex-0, cross-references are by default encoded as <xr type="related"></xr>
.
<entry xml:lang="nl" xml:id="borcht"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>borcht</orth>
</form>
<xr type="related">
<lbl>Cf.</lbl>
<ref target="#M012340" type="entry">burcht</ref>
</xr>
</entry>
7.3.2. Synonymy
Relation between two lexical units X and Y which are syntactically identical and have the property that any declarative sentence S containing X has equivalent truth conditions to another sentence S’ which is identical to S, except that X is replaced by Y. (Adapted from Cruse 1986.)
Synonymy is the linguistic parallel of the identity relation between classes. Synonyms differ in peripheral traits, related for example to stylistic, dialectal or diachronic variations.
Examples: [de] {Hund, Köter}, [en] {flashlight, torch}, [en] {glad, joyful, happy}, [en] {violin, fiddle} [en] He plays the violin very well/He plays the fiddle very well.
In TEI Lex-0, synonyms are encoded inside <xr type="synonymy"></xr>
<entry xml:id="arbeitsunfähig" xml:lang="de" type="mainEntry"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>arbeitsunfähig</orth>
</form>
<sense xml:id="arbeitsunfähig.1">
<xr type="synonymy">
<ref type="entry">bettlägerig</ref>
</xr>
<pc>,</pc>
<xr type="synonymy">
<ref type="entry">krank</ref>
</xr>
<pc>,</pc>
<xr type="synonymy">
<ref type="entry">unpässlich</ref>
</xr>
<pc>;</pc>
</sense>
<sense xml:id="arbeitsunfähig.2">
<pc>(</pc>
<usg type="domain">bildungsspr.</usg>
<pc>):</pc>
<xr type="synonymy">
<ref type="entry">indisponiert</ref>
</xr>
</sense>
<sense xml:id="arbeitsunfähig.3">
<xr type="synonymy">
<pc>(</pc>
<lbl>oft</lbl>
<usg type="attitude">emotional</usg>
<pc>):</pc>
<ref type="entry">malade</ref>
</xr>
<pc>.</pc>
</sense>
</entry>Duden (2007)
7.3.3. Hyperonymy
Relation between lexical heads X and Y characterised by the property that the sentence This is a(n) Y entails, but is not entailed by the sentence This is a(n) X. (Adapted from Cruse 1986.)
Hyperonymy is the converse of hyponymy.
Example: dog/animal (animal is a hypernym of dog)
In TEI Lex-0, hyperonyms are encoded inside <xr type="hyperonymy"></xr>
.
<entry xml:id="XY.dog" xml:lang="en" type="mainEntry"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>dog</orth>
</form>
<gramGrp>
<gram type="pos">n</gram>
</gramGrp>
<xr type="hypernymy">
<ref type="entry">mammal</ref>
</xr>
</entry>
7.3.4. Hyponymy
Relation between lexical units X and Y characterised by the property that the sentence This is a(n) X entails, but is not entailed by the sentence This is a(n) Y. (Adapted from Cruse 1986.)
Hyponymy and its converse hypernymy are the linguistic parallels of the relation of inclusion between two classes.
Examples: [en] animal/dog, red/scarlet, to kill/to murder
In TEI Lex-0, hyponyms are encoded inside <xr type="hyponymy"></xr>
.
7.3.5. Meronymy
An inclusion relation between lexical heads X and Y which reflect a potential part-whole relation between their referents in discourse. (Adapted from Cruse 2011, p. 140)
Example: finger:hand (finger is said to be a meronym of hand, and hand is said to be the holonym of finger).
In TEI Lex-0, meornyms are encoded inside <xr type="meronymy"></xr>
.
7.3.6. Antonymy
Relation between lexical units of opposite meaning.
In TEI Lex-0, antonyms are encoded inside <xr type="antonymy"></xr>
.
<sense xml:id="DLPC.antepassado_a_1"
xml:base="../TEILex0.examples/examples.stripped.xml" xml:lang="pt">
<def>Que pertence ou viveu numa época anterior.</def>
<xr type="synonymy">
<ref type="sense">antecessor</ref>
</xr>
<xr type="synonymy">
<ref type="sense">sucessor</ref>
</xr>
<xr type="antonymy">
<ref type="sense">descendente</ref>
</xr>
<xr type="antonymy">
<ref type="sense">sucessor</ref>
</xr>
</sense>
7.4. Cross-references in definitions
In TEI, it is impossible to have a cross-reference inside a definition, yet some dictionaries do use this mechanism. In TEI Lex-0, <xr> is allowed within <def>:
<entry xml:id="VSK.SR.грдомајчић" xml:lang="sr"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>грдо́ма̑јчић</orth>
</form>
<pc>,</pc>
<gramGrp>
<gram type="pos">м</gram>
</gramGrp>
<usg type="geographic">
<pc>(</pc>у Ц.г.<pc>)</pc>
</usg>
<sense xml:id="VSK.SR.грдомајчић.1">
<def>као укор или поруга, и ваља да значи: којему је <xr type="related">
<ref type="entry" target="#VSK.SR.мајка">мајка</ref>
</xr> била <xr type="related">
<ref type="entry" target="VSK.SR.грдан2">грдна</ref>
</xr>
</def>
<pc>,</pc>
<cit type="translationEquivalent" xml:lang="de">
<form type="lemma">
<orth>ein Schimpfwort</orth>
</form>
</cit>
<pc>,</pc>
<cit type="translationEquivalent" xml:lang="la">
<form type="lemma">
<orth>convicium in mulierem</orth>
</form>
</cit>
<pc>.</pc>
</sense>
</entry>
7.5. Further examples
7.5.1. More complex example including quotations
<entry xml:id="dog" xml:lang="en"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>dog</orth>
</form>
<sense xml:id="dog.1">
<gramGrp>
<gram type="gen" value="m">Male or unknown gender</gram>
</gramGrp>
<cit type="translationEquivalent" xml:lang="fr">
<form>
<orth>chien</orth>
</form>
</cit>
<cit type="example" xml:lang="fr">
<quote> Le matin j'ouvre au <ref type="oRef">chien</ref> et je lui fais manger sa
soupe. Le soir je lui siffle de venir se coucher</quote>
<bibl>RENARD, Poil de Carotte, 1894, p. 102.</bibl>
<cit type="translation" xml:lang="en">
<!-- included in the french cit, otherwise relation is lost -->
<quote>In the morning, I open the door for the dog, and I
<!--...-->
</quote>
</cit>
</cit>
</sense>
<sense xml:id="dog.2">
<gramGrp>
<gram type="gen" value="f">Female</gram>
</gramGrp>
<cit type="translationEquivalent" xml:lang="fr">
<form type="lemma">
<orth>chienne</orth>
</form>
</cit>
<cit type="example" xml:lang="fr">
<quote>6. Les fleuristes, murmura Lorilleux, toutes des Marie-couche-toi-là. Eh
bien! Et moi? reprit la grande veuve, les lèvres pincées. Vous êtes galant.
Vous savez, je ne suis pas une <ref type="oRef">chienne</ref>, je ne me mets
pas les pattes en l'air, quand on siffle! </quote>
<bibl>ZOLA, L'Assommoir, 1877, p. 681.</bibl>
<cit type="translation" xml:lang="en">
<quote>
<!--...-->
</quote>
</cit>
</cit>
</sense>
</entry>
7.5.2. Antepassado
<entry xml:lang="pt" xml:id="DLPC.antepassado_a"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>antepassado</orth>
<pron>ɐ̃tɨpɐsˈadu</pron>
</form>
<form type="inflected">
<orth>antepassado</orth>
<gramGrp>
<gram type="gen">m.</gram>
</gramGrp>
</form>
<form type="inflected">
<orth>antepassada</orth>
<gramGrp>
<gram type="gen">f.</gram>
</gramGrp>
<pron>ɐ̃tɨpɐsˈadɐ</pron>
<lbl>:1</lbl>
</form>
<gramGrp>
<gram type="pos" norm="ADJ">adj.</gram>
</gramGrp>
<etym type="grammaticalization">
<seg type="desc">De</seg>
<cit type="etymon">
<form>
<orth extent="pref">ante-</orth>
</form>
</cit>
<lbl>+</lbl>
<cit type="etymon">
<form>
<orth>passado</orth>
</form>
</cit>
</etym>
<sense xml:id="DLPC.antepassado_a_1">
<def>Que pertence ou viveu numa época anterior.</def>
<xr type="synonymy">
<ref type="sense">antecessor</ref>
</xr>
<xr type="synonymy">
<ref type="sense">sucessor</ref>
</xr>
<xr type="antonymy">
<ref type="sense">descendente</ref>
</xr>
<xr type="antonymy">
<ref type="sense">sucessor</ref>
</xr>
</sense>
</entry>
7.5.3. Cross-references inside definitions
Allowed in TEI Lex-0. See this issue on GitHub.
8. Usage
Usage labels is a procedure which indicates that “a certain lexical item deviates in a certain respect from the main bulk of items described in a dictionary and that its use is subject to some kind of restriction”
In the current TEI guidelines, <usg> is defined as an element which marks up “usage information in a dictionary entry”. Prototypically, usage information is a label which can be attached at various points in the entry hierarchy in order to signal restrictions in terms of geographic regions, domains of specialized language or stylistic properties for the particular lexical item that it is attached to.
8.1. Label-like vs. narrative usage descriptions
Usage information ca be provided in dictionaries both in the form of label-like descriptors (often abbreviated) and as fuller narrative expressions.
Consider, for instance, the following senses taken from a German entry for Pflaume “plum” where usage information is provided by labels taken from fixed sets of values for stylistic and diatopic properties:
<entry xml:id="pflaume" xml:lang="de" type="mainEntry"
xml:base="../TEILex0.examples/examples.stripped.xml">
<form type="lemma">
<orth>Pflaume</orth>
</form>
<sense n="1" xml:id="pflaume.1">
<def xml:lang="de">Frucht des Pflaumenbaums</def>
<def xml:lang="en">fruit of the plum tree</def>
</sense>
<sense n="2" xml:id="pflaume.2">
<usg type="socioCultural" norm="colloquial">ugs.</usg>
<def xml:lang="de">Pflaumenbaum</def>
<def xml:lang="en">plum tree</def>
</sense>
<sense n="3" xml:id="pflaume.3">
<usg type="socioCultural" norm="casual">salopp</usg>
<usg type="socioCultural" norm="expletive">Schimpfwort</usg>
<def xml:lang="de">ungeschickter, untauglicher Mensch</def>
<def xml:lang="en">awkward, ineligible person</def>
</sense>
<sense n="4" xml:id="pflaume.4">
<usg type="geographic" norm="regional">landsch.</usg>
<usg type="socioCultural" norm="casual">salopp</usg>
<def xml:lang="de">anzügliche, leicht boshafte Bemerkung</def>
<def xml:lang="en">offensive, slightly mischievous remark</def>
</sense>
</entry>
In contrast to the example above, the following sample features an occurrence of a more verbose usage description that does not rely on a fixed vocabulary. The sample is taken from a Serbian dialect dictionary. The quote in the dialect is further qualified by a usage hint: “(said by a peasant woman in the field in hot weather)” which provides a particular context in which the quote was recorded.
<cit type="example" xml:base="../TEILex0.examples/examples.stripped.xml"
xml:lang="sr">
<quote>„Ду́ни, ве́тре, се́јче леб да пе́че”</quote>
<usg type="hint">(рекла сељанка на њиви за време врућине)</usg>
<bibl>(<placeName>Дубница</placeName>).</bibl>
</cit>Златановић (2017)
8.2. Types of usage
In TEI Lex-0, <usg> is a typed element and type is a mandatory attribute. The default value is: <usg type="hint"></usg>
. The default attribute value should be used when it is not possible to otherwise classify the usage label. The type of a <usg> should be thought of as a conceptual axis (independent from other types) along which the given value of the element is located.
The following list of label types and their definitions is adapted from Salgado et al. 2019b:
- temporal label: marker which identifies the use of a given lexical unit on a scale from old to new. Syn: diachronic marking; diachronic information; time label.
<usg type="time"/>
- geographic label: marker which identifies the place or region where a lexical unit is mainly used. Some dictionaries do not identify a specific place but identify that the word is not used generally in every geographic area (e.g., regionalismo in Portuguese, or покр. (abbrev. for покрајински) in Serbian). Syn: diatopic marking; diatopic information; region label.
<usg type="geographic"/>
- domain label: marker which identifies the specialized field of knowledge in which a lexical unit is mainly used. Syn: diatechnical marking; domain label; field label; subject field label; topic label.
<usg type="domain"/>
- frequency label: marker which identifies the relative rate of occurrence of a lexical unit in a given textual context. Syn: diafrequential marking; diafrequential information
<usg type="frequency"/>
- textType label: marker which identifies the typical use of a lexical unit in a particular discourse type or genre Syn: diatextual information.
<usg type="textType"/>
- attitude label: marker which identifies the speaker’s subjective point of view, positive or negative, regarding the object referred to by a given lexical unit. Syn: diaevaluative marking; diaevaluative information.
<usg type="attitude"/>
- socioCultural label: marker which identifies the use of a given lexical unit by particular social groups and/or in certain types of communicative situations depending on their level of formality Syn: diaphasic marking; diaphasic information.
<usg type="socioCultural"/>
- meaningType label: marker which identifies a semantic extension of the sense of a given lexical unit.
<usg type="meaningType"/>
- normativity label: marker which identifies the use of a given lexical unit which is in some aspect considered to be non-standard or incorrect.
<usg type="normativity"/>
The TEI Guidelines offer a range of sample values for types to illustrate potential uses of <usg>, but not al of them have been carried over to TEI Lex-0. The following table shows the differences between suggested values of type in TEI and the required values of type in TEI Lex-0:
TEI P5 (suggested types) | TEI Lex-0 (required types) | Еxample values |
time | temporal | archaic, old |
geo | geographic | AmE., dial. |
dom | domain | Med., Biol., Phys. |
plev | frequency | rare, occas. |
- | textType | bibl., poet., admin., journalese |
- | attitude | derog., euph. |
reg | socioCultural | slang, vulgar, formal |
style | meaningType | fig. (=figurative), lit. (= literal) |
- | normativity | non-standard, incorrect |
lang | - | |
gram | - | |
syn | - | |
hyper | - | |
colloc | - | |
comp | - | |
obj | - | |
subj | - | |
verb | - | |
hint | hint |
In TEI-Lex-0:
- The type attribute is made mandatory.
- The element <usg> is used in a narrower sense than is currently the case in the TEI Guidelines.
- The norm attribute is encouraged.
Justification:
- Without type attribute, <usg> would be an underspecified element. Usage labels describe a wide range of linguistic phenomena. Classifying them should be considered a good practice.
- Currently, the TEI Guidelines contain an overuse of <usg> for describing phenomena that could be covered by alternative, more narrowly defined TEI elements. It should be considered a good practice to use the most specific TEI element available. See table above and the next section Restricting the scope of <usg>
- It is good practice to normalize the values of the <usg> elements because dictionaries are not always consistent in the way they use their usage labels. For instance, abbreviated and unabbreviated labels can appear in the same dictionary: they should be normalized to a single value. Normalization should be only restricted to a single dictionary. A global normalization effort is currently beyond the scope of TEI Lex-0.
8.3. Restricting the scope of usg
Do not use <usg type="lang"> to mark up the name of a language in an etymological or other discussion. The recommended way to encode this information is using <lang> element within <etym>.
INCORRECT
<entryFree xml:id="MZ.RGJS.сајдисльк_1"> <form type="lemma"> <orth>сајдисль́к</orth> </form> <gramGrp> <gram type="pos">м</gram> </gramGrp> <usg type="lang">тур.</usg> <sense> <def>уважавање.</def> … </sense> </entryFree>
CORRECT
<entry xml:id="MZ.RGJS.сајдисльк_2" xml:lang="sr" xml:base="../TEILex0.examples/examples.stripped.xml"> <form type="lemma"> <orth>сајдисль́к</orth> </form> <gramGrp> <gram type="pos">м</gram> </gramGrp> <etym> <lang value="tr" expand="турцизам" norm="tr">*</lang> </etym> <!--...--> <sense xml:id="MZ.RGJS.сајдисльк_2.1"> <def>уважавање.</def> <!--...--> </sense> </entry>
- Do not use
<usg type="hyper"></usg>
or<usg type="syn"/>
to mark lexical relations such as hyperonymy or synonymy. The recommended way to encode lexical relations in TEI Lex-0 the reference mechanism provided by <xr>. See the secion on the typology of cross-references.. - Do not use
<usg type="colloc"></usg>
or for that matter "comp.", "obj.", "subj.", "verb" etc., to encode collocations or rection information. See TODO. <usg type="hint"></usg>
should be used as fallback for cases where the usage information does not fall into one of the recognized cases discussed above; or as an intermediate solution during the process of encoding the dictionary automatically.- Frequency information on lexicographic entities may differ from other types of usage information in that it often cannot be interpreted without further context. In phrases such as “mostly biology” or “rarely used in American English” it serves the purpose of a modifier (quantifier) to another usage information (or other lexical information). Such use calls for modeling the frequency information as an attribute to the usg element modified. For frequency information provided explicitly (e.g. corpus frequencies), a separate element should be introduced. TODO
8.4. Hierarchical usage labels
Usage labels tend to be described in dictionaries as flat lists: the list of all labels usually appears in the front matter, and often as part of lists of abbreviations, which may include different types of content, i.e. not only usage labels but also other types of abbreviations (grammatical, etymological etc.) This is less than ideal from a data-modeling point of view, especially when more generic usage labels (such as sport) appear together with more specific types of labels (such as football, basketball or volleyball).
To overcome the deficiency of flat representation of labels in general-language dictionaries, TEI Lex-0 recommends that canonical, possibly multilingual, labels be defined, when needed, in the <encodingDesc> section of the <teiHeader>, and then pointed to from the individual entries or senses in which these labels are used. This is possible in both TEI P5 and TEI Lex-0 but has not been documented until now as a solution for representing usage labels.
A <taxonomy> is encoded within a <classDecl> using <category> and <catDesc> elements. TEI Lex-0 is stricter than TEI P5 because it requires the use of <term> within <catDesc>. The definition of a given <term> can be optionally provided as a <gloss>.
The following example shows the recommended way of encoding two super domains earth science and sport, together with some of their subdomains:
<encodingDesc xml:base="../TEILex0.examples/headers/DLP.stripped.xml">
<classDecl>
<taxonomy xml:id="domain">
<category xml:id="domain.earth_sciences">
<catDesc xml:lang="en">
<term>Earth Sciences</term>
<gloss>
<!--Definition of the term would go here.-->
</gloss>
</catDesc>
<catDesc xml:lang="pt">
<term>Ciências da Terra</term>
</catDesc>
<catDesc xml:lang="es">
<term>Ciencias de la Tierra</term>
</catDesc>
<catDesc xml:lang="fr">
<term>sciences de la Terre</term>
</catDesc>
<category xml:id="domain.earth_sciences.geology">
<catDesc xml:lang="en">
<term>Geology</term>
</catDesc>
<catDesc xml:lang="pt">
<term>Geologia</term>
</catDesc>
<catDesc xml:lang="es">
<term>Geología</term>
</catDesc>
<catDesc xml:lang="fr">
<term>Geologie</term>
</catDesc>
<category xml:id="domain.earth_sciences.geology.mineralogy">
<catDesc xml:lang="en">
<term>Mineralogy</term>
</catDesc>
<catDesc xml:lang="pt">
<term>Mineralogia</term>
</catDesc>
<catDesc xml:lang="es">
<term>Mineralogía</term>
</catDesc>
<catDesc xml:lang="fr">
<term>Mineralogie</term>
</catDesc>
</category>
</category>
</category>
<category xml:id="domain.sports">
<catDesc xml:lang="en">
<term>Sport</term>
</catDesc>
<catDesc xml:lang="pt">
<term>Desporto</term>
</catDesc>
<catDesc xml:lang="es">
<term>Deporte</term>
</catDesc>
<catDesc xml:lang="fr">
<term>Sport</term>
</catDesc>
<category xml:id="domain.sports.football">
<catDesc xml:lang="en">
<term>Football</term>
</catDesc>
<catDesc xml:lang="pt">
<term>Futebol</term>
</catDesc>
<catDesc xml:lang="es">
<term>Fútebol</term>
</catDesc>
<catDesc xml:lang="fr">
<term>Football</term>
</catDesc>
</category>
</category>
</taxonomy>
</classDecl>
</encodingDesc>
To apply a domain label in an entry, use the <usg> element with a valueDatcat attribute pointing to the xml:id
of the appropriate category in the taxonomy.
<entry type="mainEntry" xml:lang="pt" xml:id="DLPC.cristalografia"
xml:base="../TEILex0.examples/headers/DLP.stripped.xml">
<form type="lemma">
<orth>cristalografia</orth>
<pron>kriʃtɐluɡrɐˈfiɐ</pron>
</form>
<gramGrp>
<gram type="pos" norm="NOUN">n.</gram>
<gram type="gen">f.</gram>
</gramGrp>
<sense xml:id="DLPC.cristalografia_1">
<usg type="domain" valueDatcat="#domain.earth_sciences.geology.mineralogy">Mineralogia</usg>
<def>ciência que estuda e descreve a forma e a estrutura dos cristais, bem como as leis que regem a sua formação</def>
</sense>
<!--etc.-->
</entry>
9. Etymology
This section needs to be transferred from Jack's and Laurent's paper.
10. Patterns
10.1. Inheritance of xml:lang
Some elements in TEI Lex-0, like <entry>, for instance, have a required attribute xml:lang; others like <form> or <quote> do not. In general, TEI Lex-0, unlike TEI, recommends that the xml:lang be attached to so-called container elements (for instance, <entry> and <cit>) rather than on individual word forms or textual segments.
TODO: Add some examples
So how can we extract all orthographic forms in a particular language? We can use an XPath expression like this: //orth[ancestor-or-self::*[@xml:lang][1][@xml:lang='en']]
.
This XPath expression identifies:
- each
orth
element, regardless of where it is in the document (//
) - but only if it itself or one of its ancestors has the
@xml:lang
attribute ([ancestor-or-self::*[@xml:lang]]
) - when looking for ancestors with the
@xml:lang
attribute, we stop at the first such ancestor (i.e. we look for the nearest ancestors) ([1]
) - finally, we filter out only those selected elements with the
@xml:lang
attribute whose value is'en'
If your dictionary uses multiple language tags for one language (as in 'en'
, 'en-GB
' and 'en-US'
) and you want to capture all language varieties with one XPath expression, you can use the XPath lang()
function as in: //orth[ancestor-or-self::*[@xml:lang][1][lang('en')]]
.
While the predicate [@xml:lang='en']
will match only those elements whose xml:lang
is exactly equal to 'en
', the predicate with the function [lang('en')]
will match all the elements whose language is tagged as either English (i.e. 'en'
) or one of its 'sublanguages' such as 'en-GB'
.
If you are new to XPath, you can check out a DARIAH-Campus tutorial XPath for Dictionary Nerds.
11. Bibliography
- Almonjid. 2014. The Dictionary of [Arabic] Language and Proper Nouns. Dar el-Machreq: Beirut.
- Atkins Rundell, B. T. S. Michael. 2008. The Oxford Guide to Practical Lexicography. Oxford University Press: Oxford; New York. ISBN callNumber: 9780199277711 P327 .A88 2008. .
- Chambers. 2011. The Chambers Dictionary. 12th Edition. Chambers Harrap Publishers: London. ISBN: 9780550102379.
- Cruse, D. A.. 1986. Lexical semantics. Cambridge University Press: Cambridge and New York. ISBN: 9780521276436.
- Cruse, D. A.. 2011. Meaning in language: an introduction to semantics and pragmatics. 3rd ed. Oxford University Press: Oxford. ISBN: 9780199559466.
- DLPC. 2001. Dicionário da Língua Portuguesa Contemporânea. Editorial Verbo: Lisboa.
- Du Cange, Charles. 1688. Glossarium ad Scriptores Mediae et Infimae Graecitatis. Apud Amissonios: Lugduni.
- Duden. 2007. Das Synonymwörterbuch. Dudenverlag: Mannheim.
- Erjavec, Tomaž, Roger Evans, Nancy Ide and Adam Kilgarriff. 2000. "The CONCEDE Model for Lexical Databases." Proceedings of the Second Language Resources and Evaluation Conference (LREC), 355-62.
- Ermolaev, Natalia and Toma Tasovac. 2012. "Building a Lexicographic Infrastructure for Serbian Digital Libraries." Libraries in the Digital Age (LIDA) Proceedings.
- EtymWB-XML. 2009. Wörterbuch des Deutschen: Die XML-Edition. Berlin-Brandenburgische Akademie der Wissenschaften: Berlin.
- Ide, Nancy, Adam Kilgarriff and Laurent Romary. 2000. "A Formal Model of Dictionary Structure and Content." Proceedings of Euralex 2000, 113-126. arxiv: 0707.3270.
- LDOCE. 2003. Longman Dictionary of Contemporary English. 4th Edition. Longman: Harlow. ISBN: 0582776465.
- OALD. 1974. Oxford Advanced Learner's Dictionary of Current English. Oxford University Press: Oxford.
- Romary, Laurent. 2015. "TEI and LMF crosswalks." Journal for language technology and computational linguistics. HAL: hal-00762664.
- Romary, Laurent and Toma Tasovac. 2018. "TEI Lex-0: A Target Format for TEI-Encoded Dictionaries and Lexical Resources." TEI Conference.
- Salgado, Ana, Rute Costa, Toma Tasovac and Alberto Simões. 2019. "TEI Lex-0 In Action: Improving the Encoding of the Dictionary of the Academia das Ciências de Lisboa." eLex 2019, 417-433.
- Salgado, Ana, Rute Costa and Toma Tasovac. 2019. "Improving the Consistency of Usage Labelling in Dictionaries with TEI Lex-0." Lexicography 6: 133–156. DOI: 10.1007/s40607-019-00061-x. .
- Silva, Antônio de Morais. 1789. Diccionario da lingua portugueza. Na Officina de Simão Thaddeo Ferreira: Lisboa.
- StčS. 1999-2011. Staročeský slovník. Ústav pro jazyk český AV ČR, v. v. i.: Praha.
- Svensén, Bo. 2009. A handbook of lexicography: the theory and practice of dictionary-making. Cambridge University Press: New York. ISBN: 9780521881807.
- Tasovac, Toma, Ana Salgado and Rute Costa. 2020. "Encoding Polylexical Units with TEI Lex-0: A Case Study." Slovenšcina 2.0.
- VOLP. 1940. Vocabulário Ortográfico da Língua Portuguesa [em linha]. Academia das Ciências de Lisboa/Imprensa Nacional de Lisboa: Lisboa.
- Zgusta, Ladislav. 1971. Manual of Lexicography. Academia: Prague. ISBN: 9783111980461.
- Zillig, Brian L Pytlik. 2009. "TEI Analytics: converting documents into a TEI format for cross-collection text analysis." Literary and Linguistic Computing 24: 187–192. DOI: 10.1093/llc/fqp005. .
- Zöfgen, Ekkehard. 1989. "Homonymie und Polysemie im allgemeinen einsprachigen Wörterbuch." Wörterbücher. Ein internationales Handbuch zur Lexikographie. I: 425-464.
- Златановић, Момчило. 2017. Речник говора јужне Србије: електронско издање. Институт за српски језик САНУ и Центар за дигиталне хуманистичке науке: Београд.
- Московљевић, Милош С.. 1990. Речник савременог српскохрватског књижевног језика с књижевним саветником. Аполон: Београд.
12. Specification
12.1. Elements
12.1.1. <TEI>
<TEI> (TEI document) contains a single TEI-conformant document, combining a single TEI header with one or more members of the model.resource class. Multiple <TEI> elements may be combined within a <TEI> (or <teiCorpus>) element. [4. Default Text Structure 15.1. Varieties of Composite Text] | |||||||||||||||||||
Module | textstructure — Specification | ||||||||||||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.typed (type, @subtype)
| ||||||||||||||||||
Contained by | textstructure: TEI | ||||||||||||||||||
May contain | |||||||||||||||||||
Note | This element is required. It is customary to specify the TEI namespace http://www.tei-c.org/ns/1.0 on it, for example: <TEI version="4.4.0" xml:lang="it" xmlns="http://www.tei-c.org/ns/1.0">. | ||||||||||||||||||
Example |
| ||||||||||||||||||
Example |
| ||||||||||||||||||
Schematron |
| ||||||||||||||||||
Schematron |
| ||||||||||||||||||
Content model |
| ||||||||||||||||||
Schema Declaration |
|
12.1.2. <abbr>
<abbr> (abbreviation) contains an abbreviation of any sort. [3.6.5. Abbreviations and Their Expansions] | |||||||||||||
Module | core — Specification | ||||||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.typed (type, @subtype)
| ||||||||||||
Member of | |||||||||||||
Contained by | |||||||||||||
May contain | |||||||||||||
Note | If abbreviations are expanded silently, this practice should be documented in the <editorialDecl>, either with a <normalization> element or a <p>. | ||||||||||||
Example |
| ||||||||||||
Example |
| ||||||||||||
Content model |
| ||||||||||||
Schema Declaration |
|
12.1.3. <affiliation>
<affiliation> (affiliation) contains an informal description of a person's present or past affiliation with some organization, for example an employer or sponsor. [15.2.2. The Participant Description] | |||||||||||
Module | namesdates — Specification | ||||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.editLike (@evidence, @instant) att.datable (@calendar, @period) (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) (att.datable.iso (@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)) (att.datable.custom (@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)) att.naming (@role, @nymRef) (att.canonical (@key, @ref)) att.typed (type, @subtype)
| ||||||||||
Member of | |||||||||||
Contained by | |||||||||||
May contain | |||||||||||
Note | If included, the name of an organization may be tagged using either the <name> element as above, or the more specific <orgName> element. | ||||||||||
Example |
| ||||||||||
Example | This example indicates that the person was affiliated with the Australian Journalists Association at some point between the dates listed.
| ||||||||||
Example | This example indicates that the person was affiliated with Mount Holyoke College throughout the entire span of the date range listed.
| ||||||||||
Content model |
| ||||||||||
Schema Declaration |
|
12.1.4. <analytic>
<analytic> (analytic level) contains bibliographic elements describing an item (e.g. an article or poem) published within a monograph or journal and not as an independent publication. [3.12.2.1. Analytic, Monographic, and Series Levels] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | core: biblStruct |
May contain | |
Note | May contain titles and statements of responsibility (author, editor, or other), in any order. The <analytic> element may only occur within a <biblStruct>, where its use is mandatory for the description of an analytic level bibliographic item. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.5. <appInfo>
<appInfo> (application information) records information about an application which has edited the TEI file. [2.3.11. The Application Information Element] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Member of | |
Contained by | header: encodingDesc |
May contain | Empty element |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.6. <author>
<author> (author) in a bibliographic reference, contains the name(s) of an author, personal or corporate, of a work; for example in the same form as that provided by a recognized bibliographic name authority. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.naming (@role, @nymRef) (att.canonical (@key, @ref)) att.datable (@calendar, @period) (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) (att.datable.iso (@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)) (att.datable.custom (@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)) |
Member of | |
Contained by | header: editionStmt titleStmt |
May contain | |
Note | Particularly where cataloguing is likely to be based on the content of the header, it is advisable to use a generally recognized name authority file to supply the content for this element. The attributes key or ref may also be used to reference canonical information about the author(s) intended from any appropriate authority, such as a library catalogue or online resource. In the case of a broadcast, use this element for the name of the company or network responsible for making the broadcast. Where an author is unknown or unspecified, this element may contain text such as Unknown or Anonymous. When the appropriate TEI modules are in use, it may also contain detailed tagging of the names used for people, organizations or places, in particular where multiple names are given. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.7. <authority>
<authority> (release authority) supplies the name of a person or other agency responsible for making a work available, other than a publisher or distributor. [2.2.4. Publication, Distribution, Licensing, etc.] | |||||||
Module | header — Specification | ||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.canonical (@key, @ref)
| ||||||
Member of | |||||||
Contained by | core: monogr header: publicationStmt | ||||||
May contain | |||||||
Example |
| ||||||
Content model |
| ||||||
Schema Declaration |
|
12.1.8. <availability>
<availability> (availability) supplies information about the availability of a text, for example any restrictions on its use or distribution, its copyright status, any licence applying to it, etc. [2.2.4. Publication, Distribution, Licensing, etc.] | |||||||||
Module | header — Specification | ||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.declarable (@default)
| ||||||||
Member of | |||||||||
Contained by | header: publicationStmt | ||||||||
May contain | |||||||||
Note | A consistent format should be adopted | ||||||||
Example |
| ||||||||
Example |
| ||||||||
Content model |
| ||||||||
Schema Declaration |
|
12.1.9. <back>
<back> (back matter) contains any appendixes, etc. following the main part of a text. [4.7. Back Matter 4. Default Text Structure] | |
Module | textstructure — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | textstructure: text |
May contain | |
Note | Because cultural conventions differ as to which elements are grouped as back matter and which as front matter, the content models for the <back> and <front> elements are identical. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.10. <bibl>
<bibl> (bibliographic citation) contains a loosely-structured bibliographic citation of which the sub-components may or may not be explicitly tagged. [3.12.1. Methods of Encoding Bibliographic References and Lists of References 2.2.7. The Source Description 15.3.2. Declarable Elements] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.declarable (@default) att.typed (@type, @subtype) att.sortable (@sortKey) att.docStatus (@status) |
Member of | |
Contained by | |
May contain | |
Note | Contains phrase-level elements, together with any combination of elements from the model.biblPart class |
Example |
|
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.11. <biblScope>
<biblScope> (scope of bibliographic reference) defines the scope of a bibliographic reference, for example as a list of page numbers, or a named subdivision of a larger work. [3.12.2.5. Scopes and Ranges in Bibliographic Citations] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.citing (@unit, @from, @to) |
Member of | |
Contained by | header: seriesStmt |
May contain | |
Note | When a single page is being cited, use the from and to attributes with an identical value. When no clear endpoint is provided, the from attribute may be used without to; for example a citation such as ‘p. 3ff’ might be encoded It is now considered good practice to supply this element as a sibling (rather than a child) of <imprint>, since it supplies information which does not constitute part of the imprint. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.12. <biblStruct>
<biblStruct> (structured bibliographic citation) contains a structured bibliographic citation, in which only bibliographic sub-elements appear and in a specified order. [3.12.1. Methods of Encoding Bibliographic References and Lists of References 2.2.7. The Source Description 15.3.2. Declarable Elements] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.declarable (@default) att.typed (@type, @subtype) att.sortable (@sortKey) att.docStatus (@status) |
Member of | |
Contained by | |
May contain | core: analytic citedRange monogr note ref |
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.13. <body>
<body> (text body) contains the whole body of a single unitary text, excluding any front or back matter. [4. Default Text Structure] | |
Module | textstructure — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | textstructure: text |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.14. <c>
<c> (character) represents a character. [17.1. Linguistic Segment Categories] | |
Module | analysis — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.segLike (@function) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.fragmentable (@part)) att.typed (@type, @subtype) att.notated (@notation) |
Member of | |
Contained by | |
May contain | gaiji: g character data |
Note | Contains a single character, a <g> element, or a sequence of graphemes to be treated as a single character. The type attribute is used to indicate the function of this segmentation, taking values such as letter, punctuation, or digit etc. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.15. <catDesc>
<catDesc> (category description) describes some category within a taxonomy or text typology, either in the form of a brief prose description or in terms of the situational parameters used by the TEI formal <textDesc>. [2.3.7. The Classification Declaration] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.canonical (@key, @ref) |
Contained by | header: category |
May contain | |
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.16. <category>
<category> (category) contains an individual descriptive category, possibly nested within a superordinate category, within a user-defined taxonomy. [2.3.7. The Classification Declaration] | |
Module | header — Specification |
Attributes | att.datcat (@datcat, @valueDatcat, @targetDatcat) att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | |
May contain | |
Example |
|
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.17. <change>
<change> (change) documents a change or set of changes made during the production of a source document, or during the revision of an electronic file. [2.6. The Revision Description 2.4.1. Creation 11.7. Identifying Changes and Revisions] | |||||||
Module | header — Specification | ||||||
Attributes | att.ascribed (@who) att.datable (@calendar, @period) (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) (att.datable.iso (@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)) (att.datable.custom (@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)) att.docStatus (@status) att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.typed (@type, @subtype)
| ||||||
Contained by | header: revisionDesc | ||||||
May contain | |||||||
Note | The who attribute may be used to point to any other element, but will typically specify a <respStmt> or <person> element elsewhere in the header, identifying the person responsible for the change and their role in making it. It is recommended that changes be recorded with the most recent first. The status attribute may be used to indicate the status of a document following the change documented. | ||||||
Example |
| ||||||
Example |
| ||||||
Content model |
| ||||||
Schema Declaration |
|
12.1.18. <char>
<char> (character) provides descriptive information about a character. [5.2. Markup Constructs for Representation of Characters and Glyphs] | |
Module | gaiji — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | gaiji: charDecl |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.19. <charDecl>
<charDecl> (character declarations) provides information about nonstandard characters and glyphs. [5.2. Markup Constructs for Representation of Characters and Glyphs] | |
Module | gaiji — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Member of | |
Contained by | header: encodingDesc |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.20. <cit>
<cit> (cited quotation) contains a quotation from some other document, together with a bibliographic reference to its source. In a dictionary it may contain an example text with at least one occurrence of the word form, used in the sense being described, or a translation of the headword, or an example. [3.3.3. Quotation 4.3.1. Grouped Texts 9.3.5.1. Examples] | |||||||
Module | core — Specification | ||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.typed (type, @subtype)
| ||||||
Member of | |||||||
Contained by | |||||||
May contain | |||||||
Example |
| ||||||
Example |
| ||||||
Example |
| ||||||
Content model |
| ||||||
Schema Declaration |
|
12.1.21. <citedRange>
<citedRange> (cited range) defines the range of cited content, often represented by pages or other units [3.12.2.5. Scopes and Ranges in Bibliographic Citations] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.pointing (@targetLang, @target, @evaluate) att.citing (@unit, @from, @to) |
Member of | |
Contained by | core: bibl biblStruct |
May contain | |
Note | When a single page is being cited, use the from and to attributes with an identical value. When no clear endpoint is provided, the from attribute may be used without to; for example a citation such as ‘p. 3ff’ might be encoded |
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.22. <classDecl>
<classDecl> (classification declarations) contains one or more taxonomies defining any classificatory codes used elsewhere in the text. [2.3.7. The Classification Declaration 2.3. The Encoding Description] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Member of | |
Contained by | header: encodingDesc |
May contain | header: taxonomy |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.23. <date>
<date> (date) contains a date in any format. [3.6.4. Dates and Times 2.2.4. Publication, Distribution, Licensing, etc. 2.6. The Revision Description 3.12.2.4. Imprint, Size of a Document, and Reprint Information 15.2.3. The Setting Description 13.4. Dates] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.canonical (@key, @ref) att.datable (@calendar, @period) (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) (att.datable.iso (@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)) (att.datable.custom (@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)) att.editLike (@evidence, @instant) att.dimensions (@unit, @quantity, @extent, @precision, @scope) (att.ranging (@atLeast, @atMost, @min, @max, @confidence)) att.typed (@type, @subtype) |
Member of | |
Contained by | |
May contain | |
Example |
|
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.24. <def>
<def> (definition) contains definition text in a dictionary entry. [9.3.3.1. Definitions] | |
Module | dictionaries — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig)) |
Member of | |
Contained by | |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.25. <dictScrap>
<dictScrap> (dictionary scrap) encloses a part of a dictionary entry in which other phrase-level dictionary elements are freely combined. [9.1. Dictionary Body and Overall Structure 9.2. The Structure of Dictionary Entries] | |
Module | dictionaries — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Member of | |
Contained by | dictionaries: entry |
May contain | |
Note | May contain any dictionary elements in any combination. This element is used to mark part of a dictionary entry in which lower level dictionary elements appear, but which does not itself form an identifiable structural unit. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.26. <distributor>
<distributor> (distributor) supplies the name of a person or other agency responsible for the distribution of a text. [2.2.4. Publication, Distribution, Licensing, etc.] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.canonical (@key, @ref) |
Member of | |
Contained by | header: publicationStmt |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.27. <div>
<div> (text division) contains a subdivision of the front, body, or back of a text. [4.1. Divisions of the Body] | |
Module | textstructure — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.typed (@type, @subtype) att.written (@hand) |
Member of | |
Contained by | |
May contain | |
Example |
|
Schematron |
|
Schematron |
|
Content model |
|
Schema Declaration |
|
12.1.28. <edition>
<edition> (edition) describes the particularities of one edition of a text. [2.2.2. The Edition Statement] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Member of | |
Contained by | header: editionStmt |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.29. <editionStmt>
<editionStmt> (edition statement) groups information relating to one edition of a text. [2.2.2. The Edition Statement 2.2. The File Description] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | header: fileDesc |
May contain | |
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.30. <editor>
<editor> contains a secondary statement of responsibility for a bibliographic item, for example the name of an individual, institution or organization, (or of several such) acting as editor, compiler, translator, etc. [3.12.2.2. Titles, Authors, and Editors] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.naming (@role, @nymRef) (att.canonical (@key, @ref)) att.datable (@calendar, @period) (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) (att.datable.iso (@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)) (att.datable.custom (@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)) |
Member of | |
Contained by | header: editionStmt seriesStmt titleStmt |
May contain | |
Note | A consistent format should be adopted. Particularly where cataloguing is likely to be based on the content of the header, it is advisable to use generally recognized authority lists for the exact form of personal names. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.31. <editorialDecl>
<editorialDecl> (editorial practice declaration) provides details of editorial principles and practices applied during the encoding of a text. [2.3.3. The Editorial Practices Declaration 2.3. The Encoding Description 15.3.2. Declarable Elements] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.declarable (@default) |
Member of | |
Contained by | header: encodingDesc |
May contain | core: p |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.32. <email>
<email> (electronic mail address) contains an email address identifying a location to which email messages can be delivered. [3.6.2. Addresses] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Member of | |
Contained by | |
May contain | |
Note | The format of a modern Internet email address is defined in RFC 2822 |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.33. <encodingDesc>
<encodingDesc> (encoding description) documents the relationship between an electronic text and the source or sources from which it was derived. [2.3. The Encoding Description 2.1.1. The TEI Header and Its Components] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | header: teiHeader |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.34. <entry>
<entry> (entry) contains a single structured entry in any kind of lexical resource, such as a dictionary or lexicon. [9.1. Dictionary Body and Overall Structure 9.2. The Structure of Dictionary Entries] | |||||||||||||||||||||||
Module | dictionaries — Specification | ||||||||||||||||||||||
Attributes | att.sortable (@sortKey) att.global (xml:id, xml:lang, @n, @xml:base) att.global.rendition (@rend, @style, @rendition) att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select) att.global.analytic (@ana) att.global.facs (@facs) att.global.change (@change) att.global.responsibility (@cert, @resp) att.global.source (@source)
| ||||||||||||||||||||||
Member of | |||||||||||||||||||||||
Contained by | |||||||||||||||||||||||
May contain | |||||||||||||||||||||||
Note | Like all elements, <entry> inherits an xml:id attribute from the class global. No restrictions are placed on the method used to construct xml:ids; one convenient method is to use the orthographic form of the headword, appending a disambiguating number where necessary. Identification codes are sometimes included on machine-readable tapes of dictionaries for in-house use. It is recommended to use the <sense> element even for an entry that has only one sense to group together all parts of the definition relating to the word sense since this leads to more consistent encoding across entries. | ||||||||||||||||||||||
Example |
| ||||||||||||||||||||||
Content model |
| ||||||||||||||||||||||
Schema Declaration |
|
12.1.35. <etym>
<etym> (etymology) encloses the etymological information in a dictionary entry. [9.3.4. Etymological Information] | |||||||
Module | dictionaries — Specification | ||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig)) att.typed (type, @subtype)
| ||||||
Member of | |||||||
Contained by | |||||||
May contain | |||||||
Note | May contain character data mixed with any other elements defined in the dictionary tag set. There is no consensus on the internal structure of etymologies, or even on whether such a structure is appropriate. The <etym> element accordingly simply contains prose, within which names of languages, cited words, or parts of words, glosses, and examples will typically be prominent. The tagging of such internal objects is optional. | ||||||
Example |
| ||||||
Example |
| ||||||
Content model |
| ||||||
Schema Declaration |
|
12.1.36. <expan>
<expan> (expansion) contains the expansion of an abbreviation. [3.6.5. Abbreviations and Their Expansions] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.editLike (@evidence, @instant) |
Member of | |
Contained by | |
May contain | |
Note | The content of this element should be the expanded abbreviation, usually (but not always) a complete word or phrase. The <ex> element provided by the transcr module may be used to mark up sequences of letters supplied within such an expansion. If abbreviations are expanded silently, this practice should be documented in the <editorialDecl>, either with a <normalization> element or a <p>. |
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.37. <extent>
<extent> (extent) describes the approximate size of a text stored on some carrier medium or of some other object, digital or non-digital, specified in any convenient units. [2.2.3. Type and Extent of File 2.2. The File Description 3.12.2.4. Imprint, Size of a Document, and Reprint Information 10.7.1. Object Description] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Member of | |
Contained by | |
May contain | |
Example |
|
Example | The <measure> element may be used to supply normalized or machine tractable versions of the size or sizes concerned.
|
Content model |
|
Schema Declaration |
|
12.1.38. <figDesc>
<figDesc> (description of figure) contains a brief prose description of the appearance or content of a graphic figure, for use when documenting an image without displaying it. [14.4. Specific Elements for Graphic Images] | |
Module | figures — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | figures: figure |
May contain | |
Note | This element is intended for use as an alternative to the content of its parent <figure> element ; for example, to display when the image is required but the equipment in use cannot display graphic images. It may also be used for indexing or documentary purposes. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.39. <figure>
<figure> (figure) groups elements representing or containing graphic information such as an illustration, formula, or figure. [14.4. Specific Elements for Graphic Images] | |
Module | figures — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.placement (@place) att.typed (@type, @subtype) att.written (@hand) |
Member of | |
Contained by | core: abbr author bibl biblScope cit citedRange date editor email expan gloss head hi imprint item list name note p pubPlace publisher quote ref resp term title dictionaries: def dictScrap entry etym form gram gramGrp hyph lang lbl orth pron sense stress syll usg xr figures: figure linking: seg transcr: metamark |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.40. <fileDesc>
<fileDesc> (file description) contains a full bibliographic description of an electronic file. [2.2. The File Description 2.1.1. The TEI Header and Its Components] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | header: teiHeader |
May contain | |
Note | The major source of information for those seeking to create a catalogue entry or bibliographic citation for an electronic file. As such, it provides a title and statements of responsibility together with details of the publication or distribution of the file, of any series to which it belongs, and detailed bibliographic notes for matters not addressed elsewhere in the header. It also contains a full bibliographic description for the source or sources from which the electronic text was derived. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.41. <forename>
<forename> (forename) contains a forename, given or baptismal name. [13.2.1. Personal Names] | |
Module | namesdates — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.personal (@full, @sort) (att.naming (@role, @nymRef) (att.canonical (@key, @ref)) ) att.typed (@type, @subtype) |
Member of | |
Contained by | |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.42. <form>
<form> (form information group) groups all the information on the written and spoken forms of one headword. [9.3.1. Information on Written and Spoken Forms] | |||||||||||
Module | dictionaries — Specification | ||||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig)) att.typed (type, @subtype)
| ||||||||||
Member of | |||||||||||
Contained by | |||||||||||
May contain | |||||||||||
Example | (from TLFi) | ||||||||||
Content model |
| ||||||||||
Schema Declaration |
|
12.1.43. <front>
<front> (front matter) contains any prefatory matter (headers, abstracts, title page, prefaces, dedications, etc.) found at the start of a document, before the main body. [4.6. Title Pages 4. Default Text Structure] | |
Module | textstructure — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | textstructure: text |
May contain | |
Note | Because cultural conventions differ as to which elements are grouped as front matter and which as back matter, the content models for the <front> and <back> elements are identical. |
Example |
|
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.44. <g>
<g> (character or glyph) represents a glyph, or a non-standard character. [5. Characters, Glyphs, and Writing Modes] | |||||||
Module | gaiji — Specification | ||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.typed (@type, @subtype)
| ||||||
Member of | |||||||
Contained by | |||||||
May contain | Character data only | ||||||
Note | The name g is short for gaiji, which is the Japanese term for a non-standardized character or glyph. | ||||||
Example | This example points to a <glyph> element with the identifier ctlig like the following:
| ||||||
Example | The medieval brevigraph per could similarly be considered as an individual glyph, defined in a <glyph> element with the identifier per-glyph as follows:
| ||||||
Content model |
| ||||||
Schema Declaration |
|
12.1.45. <gloss>
<gloss> (gloss) identifies a phrase or word used to provide a gloss or definition for some other word or phrase. [3.4.1. Terms and Glosses 22.4.1. Description of Components] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.typed (@type, @subtype) att.pointing (@targetLang, @target, @evaluate) att.cReferencing (@cRef) |
Member of | |
Contained by | core: abbr author bibl biblScope cit citedRange date editor email expan gloss head hi item name note p pubPlace publisher quote ref resp term title figures: figDesc header: authority catDesc category change distributor edition extent licence principal rendition tagUsage taxonomy linking: seg transcr: metamark |
May contain | |
Note | The target and cRef attributes are mutually exclusive. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.46. <glyph>
<glyph> (character glyph) provides descriptive information about a character glyph. [5.2. Markup Constructs for Representation of Characters and Glyphs] | |
Module | gaiji — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | gaiji: charDecl |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.47. <gram>
<gram> (grammatical information) within an entry in a dictionary or a terminological data file, contains grammatical information relating to a term, word, or form. [9.3.2. Grammatical Information] | |||||||
Module | dictionaries — Specification | ||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig)) att.typed (type, @subtype)
| ||||||
Member of | |||||||
Contained by | |||||||
May contain | |||||||
Example |
| ||||||
Content model |
| ||||||
Schema Declaration |
|
12.1.48. <gramGrp>
<gramGrp> (grammatical information group) groups morpho-syntactic information about a lexical item, e.g. <pos>, <gen>, <number>, <case>, or <iType> (inflectional class). [9.3.2. Grammatical Information] | |
Module | dictionaries — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig)) att.typed (@type, @subtype) |
Member of | |
Contained by | |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.49. <graphic>
<graphic> (graphic) indicates the location of a graphic or illustration, either forming part of a text, or providing an image of it. [3.10. Graphics and Other Non-textual Components 11.1. Digital Facsimiles] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.media (@width, @height, @scale) (att.internetMedia (@mimeType)) att.resourced (@url) att.typed (@type, @subtype) |
Member of | |
Contained by | |
May contain | Empty element |
Note | The mimeType attribute should be used to supply the MIME media type of the image specified by the url attribute. Within the body of a text, a <graphic> element indicates the presence of a graphic component in the source itself. Within the context of a <facsimile> or <sourceDoc> element, however, a <graphic> element provides an additional digital representation of some part of the source being encoded. |
Example |
|
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.50. <head>
<head> (heading) contains any type of heading, for example the title of a section, or the heading of a list, glossary, manuscript description, etc. [4.2.1. Headings and Trailers] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.typed (@type, @subtype) att.placement (@place) att.written (@hand) |
Member of | |
Contained by | |
May contain | |
Note | The <head> element is used for headings at all levels; software which treats (e.g.) chapter headings, section headings, and list titles differently must determine the proper processing of a <head> element based on its structural position. A <head> occurring as the first element of a list is the title of that list; one occurring as the first element of a <div1> is the title of that chapter or section. |
Example | The most common use for the <head> element is to mark the headings of sections. In older writings, the headings or incipits may be rather longer than usual in modern works. If a section has an explicit ending as well as a heading, it should be marked as a <trailer>, as in this example:
|
Example | When headings are not inline with the running text (see e.g. the heading "Secunda conclusio") they might however be encoded as if. The actual placement in the source document can be captured with the place attribute.
|
Example | The <head> element is also used to mark headings of other units, such as lists:
|
Content model |
|
Schema Declaration |
|
12.1.51. <hi>
<hi> (highlighted) marks a word or phrase as graphically distinct from the surrounding text, for reasons concerning which no claim is made. [3.3.2.2. Emphatic Words and Phrases 3.3.2. Emphasis, Foreign Words, and Unusual Language] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.written (@hand) |
Member of | |
Contained by | |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.52. <hyph>
<hyph> (hyphenation) contains a hyphenated form of a dictionary headword, or hyphenation information in some other form. [9.3.1. Information on Written and Spoken Forms] | |
Module | dictionaries — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig)) att.notated (@notation) |
Member of | |
Contained by | |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.53. <idno>
<idno> (identifier) supplies any form of identifier used to identify some object, such as a bibliographic item, a person, a title, an organization, etc. in a standardized way. [13.3.1. Basic Principles 2.2.4. Publication, Distribution, Licensing, etc. 2.2.5. The Series Statement 3.12.2.4. Imprint, Size of a Document, and Reprint Information] | |||||||||||
Module | header — Specification | ||||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.sortable (@sortKey) att.datable (@calendar, @period) (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) (att.datable.iso (@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)) (att.datable.custom (@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)) att.typed (type, @subtype)
| ||||||||||
Member of | |||||||||||
Contained by | core: abbr analytic author bibl biblScope citedRange date editor email expan gloss head hi item monogr name note p pubPlace publisher quote resp term figures: figDesc header: authority change distributor edition extent idno licence principal publicationStmt rendition seriesStmt tagUsage transcr: metamark | ||||||||||
May contain | |||||||||||
Note | <idno> should be used for labels which identify an object or concept in a formal cataloguing system such as a database or an RDF store, or in a distributed system such as the World Wide Web. Some suggested values for type on <idno> are ISBN, ISSN, DOI, and URI. | ||||||||||
Example | In the last case, the identifier includes a non-Unicode character which is defined elsewhere by means of a <glyph> or <char> element referenced here as #sym . | ||||||||||
Content model |
| ||||||||||
Schema Declaration |
|
12.1.54. <imprint>
<imprint> groups information relating to the publication or distribution of a bibliographic item. [3.12.2.4. Imprint, Size of a Document, and Reprint Information] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | core: monogr |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.55. <item>
<item> (item) contains one component of a list. [3.8. Lists 2.6. The Revision Description] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.sortable (@sortKey) |
Contained by | core: list |
May contain | |
Note | May contain simple prose or a sequence of chunks. Whatever string of characters is used to label a list item in the copy text may be used as the value of the global n attribute, but it is not required that numbering be recorded explicitly. In ordered lists, the n attribute on the <item> element is by definition synonymous with the use of the <label> element to record the enumerator of the list item. In glossary lists, however, the term being defined should be given with the <label> element, not n. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.56. <lang>
<lang> (language name) contains the name of a language mentioned in etymological or other linguistic discussion. [9.3.4. Etymological Information] | |
Module | dictionaries — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig)) |
Member of | |
Contained by | |
May contain | |
Note | May contain character data mixed with phrase-level elements. |
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.57. <langUsage>
<langUsage> (language usage) describes the languages, sublanguages, registers, dialects, etc. represented within a text. [2.4.2. Language Usage 2.4. The Profile Description 15.3.2. Declarable Elements] | |
Module | header — Specification |
Member of | |
Contained by | header: profileDesc |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.58. <language>
<language> (language) characterizes a single language or sublanguage used within a text. [2.4.2. Language Usage] | |||||||||||||||||||
Module | header — Specification | ||||||||||||||||||
Attributes |
| ||||||||||||||||||
Contained by | header: langUsage | ||||||||||||||||||
May contain | Character data only | ||||||||||||||||||
Note | In a monolingual dictionary, where the object language and the working language are the same, one should list each as a separate <language> element with a specific role attribute. A human-readable, informal prose characterization should be supplied as content for the element. When the human-readable name(s) of languages are provided in multiple languages, the attribute xml:lang should be used to indicate what language is used to name the given object or working language. A bilingual dictionary could be documented as having two object languages. In those cases, however, it is recommended -- and more precise -- to describe each object language as either a source language or a target language. | ||||||||||||||||||
Example |
| ||||||||||||||||||
Example |
| ||||||||||||||||||
Content model |
| ||||||||||||||||||
Schema Declaration |
|
12.1.59. <lbl>
<lbl> (label) contains a label for a form, example, translation, or other piece of information, e.g. abbreviation for, contraction of, literally, approximately, synonyms:, etc. [9.3.1. Information on Written and Spoken Forms 9.3.3.2. Translation Equivalents 9.3.5.3. Cross-References to Other Entries] | |||||||||
Module | dictionaries — Specification | ||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig)) att.typed (type, @subtype)
| ||||||||
Member of | |||||||||
Contained by | |||||||||
May contain | |||||||||
Note | Labels specifically relating to usage should be tagged with the special-purpose <usg> element rather than with the generic<lbl> element. | ||||||||
Example |
| ||||||||
Content model |
| ||||||||
Schema Declaration |
|
12.1.60. <licence>
<licence> contains information about a licence or other legal agreement applicable to the text. [2.2.4. Publication, Distribution, Licensing, etc.] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.pointing (@targetLang, @target, @evaluate) att.datable (@calendar, @period) (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) (att.datable.iso (@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)) (att.datable.custom (@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)) |
Member of | |
Contained by | header: availability |
May contain | |
Note | A <licence> element should be supplied for each licence agreement applicable to the text in question. The target attribute may be used to reference a full version of the licence. The when, notBefore, notAfter, from or to attributes may be used in combination to indicate the date or dates of applicability of the licence. |
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.61. <list>
<list> (list) contains any sequence of items organized as a list. [3.8. Lists] | |||||||||||||
Module | core — Specification | ||||||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.sortable (@sortKey) att.typed (type, @subtype)
| ||||||||||||
Member of | |||||||||||||
Contained by | |||||||||||||
May contain | |||||||||||||
Note | May contain an optional heading followed by a series of items, or a series of label and item pairs, the latter being optionally preceded by one or two specialized headings. | ||||||||||||
Example |
| ||||||||||||
Example |
| ||||||||||||
Example |
| ||||||||||||
Example | The following example treats the short numbered clauses of Anglo-Saxon legal codes as lists of items. The text is from an ordinance of King Athelstan (924–939): Note that nested lists have been used so the tagging mirrors the structure indicated by the two-level numbering of the clauses. The clauses could have been treated as a one-level list with irregular numbering, if desired. | ||||||||||||
Example |
| ||||||||||||
Schematron |
| ||||||||||||
Content model |
| ||||||||||||
Schema Declaration |
|
12.1.62. <listBibl>
<listBibl> (citation list) contains a list of bibliographic citations of any kind. [3.12.1. Methods of Encoding Bibliographic References and Lists of References 2.2.7. The Source Description 15.3.2. Declarable Elements] | |||||||||||
Module | core — Specification | ||||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.sortable (@sortKey) att.declarable (@default) att.typed (type, @subtype)
| ||||||||||
Member of | |||||||||||
Contained by | |||||||||||
May contain | core: bibl biblStruct head listBibl | ||||||||||
Example |
| ||||||||||
Content model |
| ||||||||||
Schema Declaration |
|
12.1.63. <localProp>
<localProp> (locally defined property) provides a locally defined character (or glyph) property. [5.2.1. Character Properties] | |
Module | gaiji — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.gaijiProp (@name, @value, @version) |
Contained by | |
May contain | Empty element |
Note | No definitive list of local names is proposed. However, the name entity is recommended as a means of naming the property identifying the recommended character entity name for this character or glyph. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.64. <mapping>
<mapping> (character mapping) contains one or more characters which are related to the parent character or glyph in some respect, as specified by the type attribute. [5.2. Markup Constructs for Representation of Characters and Glyphs] | |
Module | gaiji — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.typed (@type, @subtype) |
Contained by | |
May contain | gaiji: g character data |
Note | Suggested values for the type attribute include exact for exact equivalences, uppercase for uppercase equivalences, lowercase for lowercase equivalences, and simplified for simplified characters. The <g> elements contained by this element can point to either another <char> or <glyph>element or contain a character that is intended to be the target of this mapping. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.65. <metamark>
<metamark> contains or describes any kind of graphic or written signal within a document the function of which is to determine how it should be read rather than forming part of the actual content of the document. [11.3.4.2. Metamarks] | |||||||||||||
Module | transcr — Specification | ||||||||||||
Attributes | att.spanning (@spanTo) att.placement (@place) att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source))
| ||||||||||||
Member of | |||||||||||||
Contained by | core: abbr author bibl biblScope cit citedRange date editor email expan gloss head hi imprint item list name note p pubPlace publisher quote ref resp term title dictionaries: def dictScrap entry etym form gram gramGrp hyph lang lbl orth pron sense stress syll usg xr figures: figure linking: seg transcr: metamark | ||||||||||||
May contain | |||||||||||||
Example |
| ||||||||||||
Content model |
| ||||||||||||
Schema Declaration |
|
12.1.66. <monogr>
<monogr> (monographic level) contains bibliographic elements describing an item (e.g. a book or journal) published as an independent item (i.e. as a separate physical object). [3.12.2.1. Analytic, Monographic, and Series Levels] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | core: biblStruct |
May contain | |
Note | May contain specialized bibliographic elements, in a prescribed order. The <monogr> element may only occur only within a <biblStruct>, where its use is mandatory for the description of a monographic-level bibliographic item. |
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.67. <name>
<name> (name, proper noun) contains a proper noun or noun phrase. [3.6.1. Referring Strings] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.personal (@full, @sort) (att.naming (@role, @nymRef) (att.canonical (@key, @ref)) ) att.datable (@calendar, @period) (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) (att.datable.iso (@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)) (att.datable.custom (@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)) att.editLike (@evidence, @instant) att.typed (@type, @subtype) |
Member of | |
Contained by | |
May contain | |
Note | Proper nouns referring to people, places, and organizations may be tagged instead with <persName>, <placeName>, or <orgName>, when the TEI module for names and dates is included. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.68. <namespace>
<namespace> (namespace) supplies the formal name of the namespace to which the elements documented by its children belong. [2.3.4. The Tagging Declaration] | |||||||
Module | header — Specification | ||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source))
| ||||||
Contained by | header: tagsDecl | ||||||
May contain | header: tagUsage | ||||||
Example |
| ||||||
Content model |
| ||||||
Schema Declaration |
|
12.1.69. <note>
<note> (note) contains a note or annotation. [3.9.1. Notes and Simple Annotation 2.2.6. The Notes Statement 3.12.2.8. Notes and Statement of Language 9.3.5.4. Notes within Entries] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.placement (@place) att.pointing (@targetLang, @target, @evaluate) att.typed (@type, @subtype) att.written (@hand) att.anchoring (@anchored, @targetEnd) |
Member of | |
Contained by | core: abbr author bibl biblScope biblStruct cit citedRange date editor email expan gloss head hi imprint item list monogr name note p pubPlace publisher quote ref resp respStmt term title dictionaries: def dictScrap entry etym form gram gramGrp hyph lang lbl orth pron sense stress syll usg xr figures: figure linking: seg transcr: metamark |
May contain | |
Example | In the following example, the translator has supplied a footnote containing an explanation of the term translated as "painterly": For this example to be valid, the code MDMH must be defined elsewhere, for example by means of a responsibility statement in the associated TEI header. |
Example | The global n attribute may be used to supply the symbol or number used to mark the note's point of attachment in the source text, as in the following example: However, if notes are numbered in sequence and their numbering can be reconstructed automatically by processing software, it may well be considered unnecessary to record the note numbers. |
Content model |
|
Schema Declaration |
|
12.1.70. <notesStmt>
<notesStmt> (notes statement) collects together any notes providing information about a text additional to that recorded in other parts of the bibliographic description. [2.2.6. The Notes Statement 2.2. The File Description] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | header: fileDesc |
May contain | core: note |
Note | Information of different kinds should not be grouped together into the same note. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.71. <num>
<num> (number) contains a number, written in any form. [3.6.3. Numbers and Measures] | |||||||||||
Module | core — Specification | ||||||||||
Attributes |
| ||||||||||
Member of | |||||||||||
Contained by | |||||||||||
May contain | Character data only | ||||||||||
Note | Detailed analyses of quantities and units of measure in historical documents may also use the feature structure mechanism described in chapter 18. Feature Structures. The <num> element is intended for use in simple applications. | ||||||||||
Example |
| ||||||||||
Content model |
| ||||||||||
Schema Declaration |
|
12.1.72. <orgName>
<orgName> (organization name) contains an organizational name. [13.2.2. Organizational Names] | |
Module | namesdates — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.datable (@calendar, @period) (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) (att.datable.iso (@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)) (att.datable.custom (@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)) att.editLike (@evidence, @instant) att.personal (@full, @sort) (att.naming (@role, @nymRef) (att.canonical (@key, @ref)) ) att.typed (@type, @subtype) |
Member of | |
Contained by | |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.73. <orth>
<orth> (orthographic form) gives the orthographic form of a dictionary headword. [9.3.1. Information on Written and Spoken Forms] | |||||||||
Module | dictionaries — Specification | ||||||||
Attributes | att.datable.w3c (@when, @notBefore, @notAfter, @from, @to) att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig)) att.partials (@extent) att.notated (@notation) att.typed (type, @subtype)
| ||||||||
Member of | |||||||||
Contained by | |||||||||
May contain | |||||||||
Example |
| ||||||||
Example |
| ||||||||
Content model |
| ||||||||
Schema Declaration |
|
12.1.74. <p>
<p> (paragraph) marks paragraphs in prose. [3.1. Paragraphs 7.2.5. Speech Contents] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.fragmentable (@part) att.written (@hand) |
Member of | |
Contained by | figures: figure header: availability change editionStmt editorialDecl encodingDesc langUsage licence projectDesc seriesStmt transcr: metamark |
May contain | |
Example |
|
Schematron |
|
Schematron |
|
Content model |
|
Schema Declaration |
|
12.1.75. <pc>
<pc> (punctuation character) contains a character or string of characters regarded as constituting a single punctuation mark. [17.1.2. Below the Word Level 17.4.2. Lightweight Linguistic Annotation] | |||||||||||||||||||||
Module | analysis — Specification | ||||||||||||||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.segLike (@function) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.fragmentable (@part)) att.typed (@type, @subtype) att.linguistic (@lemma, @lemmaRef, @pos, @msd, @join) (att.lexicographic.normalized (@norm, @orig))
| ||||||||||||||||||||
Member of | |||||||||||||||||||||
Contained by | core: abbr author bibl biblScope cit citedRange date editor email expan gloss head hi item name note p pubPlace publisher quote ref term title dictionaries: def dictScrap entry etym form gram gramGrp hyph lang lbl orth pron sense stress syll usg xr header: change distributor edition extent licence linking: seg transcr: metamark | ||||||||||||||||||||
May contain | |||||||||||||||||||||
Example |
| ||||||||||||||||||||
Example | Example encoding of the German sentence Wir fahren in den Urlaub., encoded with attributes from att.linguistic discussed in section [[undefined AILALW]].
| ||||||||||||||||||||
Content model |
| ||||||||||||||||||||
Schema Declaration |
|
12.1.76. <persName>
<persName> (personal name) contains a proper noun or proper-noun phrase referring to a person, possibly including one or more of the person's forenames, surnames, honorifics, added names, etc. [13.2.1. Personal Names] | |
Module | namesdates — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.datable (@calendar, @period) (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) (att.datable.iso (@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)) (att.datable.custom (@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)) att.editLike (@evidence, @instant) att.personal (@full, @sort) (att.naming (@role, @nymRef) (att.canonical (@key, @ref)) ) att.typed (@type, @subtype) |
Member of | |
Contained by | |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.77. <placeName>
<placeName> (place name) contains an absolute or relative place name. [13.2.3. Place Names] | |
Module | namesdates — Specification |
Attributes | att.datable (@calendar, @period) (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) (att.datable.iso (@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)) (att.datable.custom (@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)) att.editLike (@evidence, @instant) att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.personal (@full, @sort) (att.naming (@role, @nymRef) (att.canonical (@key, @ref)) ) att.typed (@type, @subtype) |
Member of | |
Contained by | |
May contain | |
Example |
|
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.78. <principal>
<principal> (principal researcher) supplies the name of the principal researcher responsible for the creation of an electronic text. [2.2.1. The Title Statement] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.canonical (@key, @ref) att.datable (@calendar, @period) (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) (att.datable.iso (@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)) (att.datable.custom (@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)) |
Member of | |
Contained by | core: bibl header: editionStmt titleStmt |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.79. <profileDesc>
<profileDesc> (text-profile description) provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting. [2.4. The Profile Description 2.1.1. The TEI Header and Its Components] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | header: teiHeader |
May contain | header: langUsage |
Note | Although the content model permits it, it is rarely meaningful to supply multiple occurrences for any of the child elements of <profileDesc> unless these are documenting multiple texts. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.80. <projectDesc>
<projectDesc> (project description) describes in detail the aim or purpose for which an electronic file was encoded, together with any other relevant information concerning the process by which it was assembled or collected. [2.3.1. The Project Description 2.3. The Encoding Description 15.3.2. Declarable Elements] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.declarable (@default) |
Member of | |
Contained by | header: encodingDesc |
May contain | core: p |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.81. <pron>
<pron> (pronunciation) contains the pronunciation(s) of the word. [9.3.1. Information on Written and Spoken Forms] | |
Module | dictionaries — Specification |
Attributes | att.datable.w3c (@when, @notBefore, @notAfter, @from, @to) att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig)) att.notated (@notation) att.partials (@extent) att.typed (@type, @subtype) |
Member of | |
Contained by | |
May contain | |
Note | The values used to specify the notation may be taken from any appropriate project-defined list of values. Typical values might be IPA, Murray, for example. |
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.82. <pubPlace>
<pubPlace> (publication place) contains the name of the place where a bibliographic item was published. [3.12.2.4. Imprint, Size of a Document, and Reprint Information] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.naming (@role, @nymRef) (att.canonical (@key, @ref)) |
Member of | |
Contained by | header: publicationStmt |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.83. <publicationStmt>
<publicationStmt> (publication statement) groups information concerning the publication or distribution of an electronic or other text. [2.2.4. Publication, Distribution, Licensing, etc. 2.2. The File Description] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | header: fileDesc |
May contain | header: authority availability distributor idno |
Note | Where a publication statement contains several members of the model.publicationStmtPart.agency or model.publicationStmtPart.detail classes rather than one or more paragraphs or anonymous blocks, care should be taken to ensure that the repeated elements are presented in a meaningful order. It is a conformance requirement that elements supplying information about publication place, address, identifier, availability, and date be given following the name of the publisher, distributor, or authority concerned, and preferably in that order. |
Example |
|
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.84. <publisher>
<publisher> (publisher) provides the name of the organization responsible for the publication or distribution of a bibliographic item. [3.12.2.4. Imprint, Size of a Document, and Reprint Information 2.2.4. Publication, Distribution, Licensing, etc.] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.canonical (@key, @ref) |
Member of | |
Contained by | header: publicationStmt |
May contain | |
Note | Use the full form of the name by which a company is usually referred to, rather than any abbreviation of it which may appear on a title page |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.85. <quote>
<quote> (quotation) contains a phrase or passage attributed by the narrator or author to some agency external to the text. [3.3.3. Quotation 4.3.1. Grouped Texts] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.typed (@type, @subtype) att.notated (@notation) |
Member of | |
Contained by | |
May contain | |
Note | If a bibliographic citation is supplied for the source of a quotation, the two may be grouped using the <cit> element. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.86. <ref>
<ref> (reference) defines a reference to another location, possibly modified by additional text or comment. [3.7. Simple Links and Cross-References 16.1. Links] | |||||||
Module | core — Specification | ||||||
Attributes | att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig)) att.notated (@notation) att.scoped (@scope) att.cReferencing (@cRef) att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.internetMedia (@mimeType) att.pointing (@targetLang, @target, @evaluate) att.typed (type, @subtype)
| ||||||
Member of | |||||||
Contained by | core: abbr analytic author bibl biblScope biblStruct cit citedRange date editor email expan gloss head hi item monogr name note p pubPlace publisher quote ref resp term title dictionaries: def dictScrap entry etym form gram gramGrp hyph lang lbl orth pron sense stress syll usg xr figures: figDesc header: authority change distributor edition extent licence principal publicationStmt rendition tagUsage linking: seg transcr: metamark | ||||||
May contain | |||||||
Note | The target and cRef attributes are mutually exclusive. | ||||||
Example |
| ||||||
Example |
| ||||||
Schematron |
| ||||||
Content model |
| ||||||
Schema Declaration |
|
12.1.87. <rendition>
<rendition> (rendition) supplies information about the rendition or appearance of one or more elements in the source text. [2.3.4. The Tagging Declaration] | |||||||||||||||||||||
Module | header — Specification | ||||||||||||||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.styleDef (@scheme, @schemeVersion)
| ||||||||||||||||||||
Contained by | header: tagsDecl | ||||||||||||||||||||
May contain | |||||||||||||||||||||
Example |
| ||||||||||||||||||||
Content model |
| ||||||||||||||||||||
Schema Declaration |
|
12.1.88. <resp>
<resp> (responsibility) contains a phrase describing the nature of a person's intellectual responsibility, or an organization's role in the production or distribution of a work. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement 2.2.2. The Edition Statement 2.2.5. The Series Statement] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.canonical (@key, @ref) att.datable (@calendar, @period) (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) (att.datable.iso (@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)) (att.datable.custom (@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)) |
Contained by | core: respStmt |
May contain | |
Note | The attribute ref, inherited from the class att.canonical may be used to indicate the kind of responsibility in a normalized form by referring directly to a standardized list of responsibility types, such as that maintained by a naming authority, for example the list maintained at http://www.loc.gov/marc/relators/relacode.html for bibliographic usage. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.89. <respStmt>
<respStmt> (statement of responsibility) supplies a statement of responsibility for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply. May also be used to encode information about individuals or organizations which have played a role in the production or distribution of a bibliographic work. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement 2.2.2. The Edition Statement 2.2.5. The Series Statement] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.canonical (@key, @ref) |
Member of | |
Contained by | header: editionStmt seriesStmt titleStmt |
May contain | |
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.90. <revisionDesc>
<revisionDesc> (revision description) summarizes the revision history for a file. [2.6. The Revision Description 2.1.1. The TEI Header and Its Components] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.docStatus (@status) |
Contained by | header: teiHeader |
May contain | |
Note | If present on this element, the status attribute should indicate the current status of the document. The same attribute may appear on any <change> to record the status at the time of that change. Conventionally <change> elements should be given in reverse date order, with the most recent change at the start of the list. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.91. <seg>
<seg> (arbitrary segment) represents any segmentation of text below the ‘chunk’ level. [16.3. Blocks, Segments, and Anchors 6.2. Components of the Verse Line 7.2.5. Speech Contents] | |
Module | linking — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.segLike (@function) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.fragmentable (@part)) att.typed (@type, @subtype) att.written (@hand) att.notated (@notation) |
Member of | |
Contained by | |
May contain | |
Note | The <seg> element may be used at the encoder's discretion to mark any segments of the text of interest for processing. One use of the element is to mark text features for which no appropriate markup is otherwise defined. Another use is to provide an identifier for some segment which is to be pointed at by some other element—i.e. to provide a target, or a part of a target, for a <ptr> or other similar element. |
Example |
|
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.92. <sense>
<sense> groups together all information relating to one word sense in a dictionary entry, for example definitions, examples, and translation equivalents. [9.2. The Structure of Dictionary Entries] | |||||||||
Module | dictionaries — Specification | ||||||||
Attributes | att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig)) att.global (xml:id, @n, @xml:lang, @xml:base) att.global.rendition (@rend, @style, @rendition) att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select) att.global.analytic (@ana) att.global.facs (@facs) att.global.change (@change) att.global.responsibility (@cert, @resp) att.global.source (@source)
| ||||||||
Member of | |||||||||
Contained by | |||||||||
May contain | |||||||||
Note | May contain character data mixed with any other elements defined in the dictionary tag set. | ||||||||
Example |
| ||||||||
Content model |
| ||||||||
Schema Declaration |
|
12.1.93. <seriesStmt>
<seriesStmt> (series statement) groups information about the series, if any, to which a publication belongs. [2.2.5. The Series Statement 2.2. The File Description] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.declarable (@default) |
Contained by | header: fileDesc |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.94. <sourceDesc>
<sourceDesc> (source description) describes the source(s) from which an electronic text was derived or generated, typically a bibliographic description in the case of a digitized text, or a phrase such as "born digital" for a text which has no previous existence. [2.2.7. The Source Description] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.declarable (@default) |
Contained by | header: fileDesc |
May contain | core: listBibl |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.95. <stress>
<stress> (stress) contains the stress pattern for a dictionary headword, if given separately. [9.3.1. Information on Written and Spoken Forms] | |
Module | dictionaries — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.notated (@notation) |
Member of | |
Contained by | dictionaries: form |
May contain | |
Note | Usually stress information is included within pronunciation information. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.96. <surname>
<surname> (surname) contains a family (inherited) name, as opposed to a given, baptismal, or nick name. [13.2.1. Personal Names] | |
Module | namesdates — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.personal (@full, @sort) (att.naming (@role, @nymRef) (att.canonical (@key, @ref)) ) att.typed (@type, @subtype) |
Member of | |
Contained by | |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.97. <syll>
<syll> (syllabification) contains the syllabification of the headword. [9.3.1. Information on Written and Spoken Forms] | |
Module | dictionaries — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig)) att.notated (@notation) |
Member of | |
Contained by | |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.98. <tagUsage>
<tagUsage> (element usage) documents the usage of a specific element within a specified document. [2.3.4. The Tagging Declaration] | |||||||||||||||||||
Module | header — Specification | ||||||||||||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.datcat (@datcat, @valueDatcat, @targetDatcat)
| ||||||||||||||||||
Contained by | header: namespace | ||||||||||||||||||
May contain | |||||||||||||||||||
Example |
| ||||||||||||||||||
Content model |
| ||||||||||||||||||
Schema Declaration |
|
12.1.99. <tagsDecl>
<tagsDecl> (tagging declaration) provides detailed information about the tagging applied to a document. [2.3.4. The Tagging Declaration 2.3. The Encoding Description] | |||||||||
Module | header — Specification | ||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source))
| ||||||||
Member of | |||||||||
Contained by | header: encodingDesc | ||||||||
May contain | |||||||||
Example | If the partial attribute were not specified here, the implication would be that the document in question contains only <hi>, <title>, and <para> elements. | ||||||||
Content model |
| ||||||||
Schema Declaration |
|
12.1.100. <taxonomy>
<taxonomy> (taxonomy) defines a typology either implicitly, by means of a bibliographic citation, or explicitly by a structured taxonomy. [2.3.7. The Classification Declaration] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | |
May contain | |
Note | Nested taxonomies are common in many fields, so the <taxonomy> element can be nested. |
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.101. <teiHeader>
<teiHeader> (TEI header) supplies descriptive and declarative metadata associated with a digital resource or set of resources. [2.1.1. The TEI Header and Its Components 15.1. Varieties of Composite Text] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | textstructure: TEI |
May contain | |
Note | One of the few elements unconditionally required in any TEI document. |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.102. <term>
<term> (term) contains a single-word, multi-word, or symbolic designation which is regarded as a technical term. [3.4.1. Terms and Glosses] | |
Module | core — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.pointing (@targetLang, @target, @evaluate) att.typed (@type, @subtype) att.canonical (@key, @ref) att.sortable (@sortKey) att.cReferencing (@cRef) |
Member of | |
Contained by | |
May contain | |
Note | When this element appears within an <index> element, it is understood to supply the form under which an index entry is to be made for that location. Elsewhere, it is understood simply to indicate that its content is to be regarded as a technical or specialised term. It may be associated with a <gloss> element by means of its ref attribute; alternatively a <gloss> element may point to a <term> element by means of its target attribute. In formal terminological work, there is frequently discussion over whether terms must be atomic or may include multi-word lexical items, symbolic designations, or phraseological units. The <term> element may be used to mark any of these. No position is taken on the philosophical issue of what a term can be; the looser definition simply allows the <term> element to be used by practitioners of any persuasion. As with other members of the att.canonical class, instances of this element occuring in a text may be associated with a canonical definition, either by means of a URI (using the ref attribute), or by means of some system-specific code value (using the key attribute). Because the mutually exclusive target and cRef attributes overlap with the function of the ref attribute, they are deprecated and may be removed at a subsequent release. |
Example |
|
Example |
|
Example |
|
Example |
|
Content model |
|
Schema Declaration |
|
12.1.103. <text>
<text> (text) contains a single text of any kind, whether unitary or composite, for example a poem or drama, a collection of essays, a novel, a dictionary, or a corpus sample. [4. Default Text Structure 15.1. Varieties of Composite Text] | |
Module | textstructure — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.typed (@type, @subtype) att.written (@hand) |
Member of | |
Contained by | textstructure: TEI |
May contain | |
Note | This element should not be used to represent a text which is inserted at an arbitrary point within the structure of another, for example as in an embedded or quoted narrative; the <floatingText> is provided for this purpose. |
Example |
|
Example | The body of a text may be replaced by a group of nested texts, as in the following schematic:
|
Content model |
|
Schema Declaration |
|
12.1.104. <title>
<title> (title) contains a title for any kind of work. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement 2.2.5. The Series Statement] | |||||||||||||||||
Module | core — Specification | ||||||||||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.canonical (@key, @ref) att.datable (@calendar, @period) (att.datable.w3c (@when, @notBefore, @notAfter, @from, @to)) (att.datable.iso (@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso)) (att.datable.custom (@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)) att.typed (type, @subtype)
| ||||||||||||||||
Member of | |||||||||||||||||
Contained by | core: abbr analytic author bibl biblScope citedRange date editor email expan gloss head hi item monogr name note p pubPlace publisher quote ref resp term title figures: figDesc header: authority change distributor edition extent licence principal rendition seriesStmt tagUsage titleStmt linking: seg transcr: metamark | ||||||||||||||||
May contain | |||||||||||||||||
Note | The attributes key and ref, inherited from the class att.canonical may be used to indicate the canonical form for the title; the former, by supplying (for example) the identifier of a record in some external library system; the latter by pointing to an XML element somewhere containing the canonical form of the title. | ||||||||||||||||
Example |
| ||||||||||||||||
Example |
| ||||||||||||||||
Example |
| ||||||||||||||||
Content model |
| ||||||||||||||||
Schema Declaration |
|
12.1.105. <titleStmt>
<titleStmt> (title statement) groups information about the title of a work and those responsible for its content. [2.2.1. The Title Statement 2.2. The File Description] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) |
Contained by | header: fileDesc |
May contain | |
Example |
|
Content model |
|
Schema Declaration |
|
12.1.106. <unicodeProp>
<unicodeProp> (unicode property) provides a Unicode property for a character (or glyph). [5.2.1. Character Properties] | |||||||||||||||
Module | gaiji — Specification | ||||||||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.gaijiProp (name, value, @version)
| ||||||||||||||
Contained by | |||||||||||||||
May contain | Empty element | ||||||||||||||
Note | A definitive list of current Unicode property names is provided in The Unicode Standard. | ||||||||||||||
Example |
| ||||||||||||||
Content model |
| ||||||||||||||
Schema Declaration |
|
12.1.107. <unihanProp>
<unihanProp> (unihan property) holds the name and value of a normative or informative Unihan character (or glyph) property as part of its attributes. [5.2.1. Character Properties] | |||||||||||||||
Module | gaiji — Specification | ||||||||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.gaijiProp (name, value, @version)
| ||||||||||||||
Contained by | |||||||||||||||
May contain | Empty element | ||||||||||||||
Note | A definitive list of current Unihan property names is provided in the Unicode Han Database. | ||||||||||||||
Example |
| ||||||||||||||
Content model |
| ||||||||||||||
Schema Declaration |
|
12.1.108. <usg>
<usg> (usage) contains usage information in a dictionary entry. [9.3.5.2. Usage Information and Other Labels] | |||||||
Module | dictionaries — Specification | ||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig)) att.typed (type, @subtype)
| ||||||
Member of | |||||||
Contained by | |||||||
May contain | |||||||
Example |
| ||||||
Content model |
| ||||||
Schema Declaration |
|
12.1.109. <xenoData>
<xenoData> (non-TEI metadata) provides a container element into which metadata in non-TEI formats may be placed. [2.5. Non-TEI Metadata] | |
Module | header — Specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.declarable (@default) att.typed (@type, @subtype) |
Contained by | header: teiHeader |
May contain | ANY |
Example | This example presumes that the prefix dc has been bound to the namespace http://purl.org/dc/elements/1.1/ and the prefix rdf is bound to the namespace http://www.w3.org/1999/02/22-rdf-syntax-ns# . Note: The about attribute on the <rdf:Description> in this example gives a URI indicating the resource to which the metadata contained therein refer. The <rdf:Description> in the second <xenoData> block has a blank about, meaning it is pointing at the current document, so the RDF is about the document within which it is contained, i.e. the TEI document containing the <xenoData> block. Similarly, any kind of relative URI may be used, including fragment identifiers (see [[undefined SG-id]]). Do note, however, that if the contents of the <xenoData> block are to be extracted and used elsewhere, any relative URIs will have to be resolved accordingly.
|
Example | In this example, the prefix rdf is bound to the namespace http://www.w3.org/1999/02/22-rdf-syntax-ns# , the prefix dc is bound to the namespace http://purl.org/dc/elements/1.1/ , and the prefix cc is bound to the namespace http://web.resource.org/cc/ .
|
Example | In this example, the prefix dc is again bound to the namespace http://www.openarchives.org/OAI/2.0/oai_dc/ , and the prefix oai_dc is bound to the namespace http://www.openarchives.org/OAI/2.0/oai_dc/ .
|
Example | In this example, the prefix mods is bound to the namespace http://www.loc.gov/mods/v3 .
|
Example | This example shows GeoJSON embedded in <xenoData>. Note that JSON does not permit newlines inside string values. These must be escaped as \n . To avoid the accidental insertion of newlines by software, the use of xml:space is recommended. Blocks of JSON should be wrapped in CDATA sections, as they may contain characters illegal in XML. Note: the example above has been trimmed for legibility. The original may be found linked from Arachosiorum Oppidum/Alexandria. The contributors, listed per the license terms, are R. Talbert, Jeffrey Becker, W. Röllig, Tom Elliott, H. Kopp, DARMC, Sean Gillies, B. Siewert-Mayer, Francis Deblauwe, and Eric Kansa. |
Content model |
|
Schema Declaration |
|
12.1.110. <xr>
<xr> (cross-reference phrase) contains a phrase, sentence, or icon referring the reader to some other location in this or another text. [9.3.5.3. Cross-References to Other Entries] | |||||||||||||
Module | dictionaries — Specification | ||||||||||||
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select)) (att.global.analytic (@ana)) (att.global.facs (@facs)) (att.global.change (@change)) (att.global.responsibility (@cert, @resp)) (att.global.source (@source)) att.lexicographic (@expand, @split, @value, @location, @mergedIn, @opt) (att.datcat (@datcat, @valueDatcat, @targetDatcat)) (att.lexicographic.normalized (@norm, @orig))
| ||||||||||||
Member of | |||||||||||||
Contained by | |||||||||||||
May contain | |||||||||||||
Note | May contain character data and phrase-level elements; usually contains a <ref> or a <ptr> element. This element encloses both the actual indication of the location referred to, which may be tagged using the <ref> or <ptr> elements, and any accompanying material which gives more information about why the reader is being referred there. | ||||||||||||
Example |
| ||||||||||||
Example |
| ||||||||||||
Content model |
| ||||||||||||
Schema Declaration |
|
12.2. Model classes
12.2.1. model.addressLike
model.addressLike groups elements used to represent a postal or email address. [1. The TEI Infrastructure] | |
Module | tei — Specification |
Used by | |
Members | affiliation email |
12.2.2. model.attributable
model.attributable groups elements that contain a word or phrase that can be attributed to a source. [3.3.3. Quotation 4.3.2. Floating Texts] | |
Module | tei — Specification |
Used by | |
Members | model.quoteLike[cit quote xr] |
12.2.3. model.availabilityPart
model.availabilityPart groups elements such as licences and paragraphs of text which may appear as part of an availability statement [2.2.4. Publication, Distribution, Licensing, etc.] | |
Module | tei — Specification |
Used by | |
Members | licence |
12.2.4. model.biblLike
model.biblLike groups elements containing a bibliographic description. [3.12. Bibliographic Citations and References] | |
Module | tei — Specification |
Used by | |
Members | bibl biblStruct listBibl |
12.2.5. model.biblPart
model.biblPart groups elements which represent components of a bibliographic description. [3.12. Bibliographic Citations and References] | |
Module | tei — Specification |
Used by | |
Members | model.imprintPart[biblScope distributor pubPlace publisher] model.respLike[author editor principal respStmt] availability bibl citedRange edition extent |
12.2.6. model.common
model.common groups common chunk- and inter-level elements. [1.3. The TEI Class System] | |
Module | tei — Specification |
Used by | |
Members | model.divPart[model.lLike model.pLike[p]] model.entryLike[entry] model.inter[model.attributable[model.quoteLike[cit quote xr]] model.biblLike[bibl biblStruct listBibl] model.egLike model.labelLike model.listLike[list] model.oddDecl model.stageLike] |
Note | This class defines the set of chunk- and inter-level elements; it is used in many content models, including those for textual divisions. |
12.2.7. model.dateLike
model.dateLike groups elements containing temporal expressions. [3.6.4. Dates and Times 13.4. Dates] | |
Module | tei — Specification |
Used by | |
Members | date |
12.2.8. model.divBottom
model.divBottom groups elements appearing at the end of a text division. [4.2. Elements Common to All Divisions] | |
Module | tei — Specification |
Used by | |
Members | model.divBottomPart model.divWrapper |
12.2.9. model.divLike
12.2.10. model.divPart
model.divPart groups paragraph-level elements appearing directly within divisions. [1.3. The TEI Class System] | |
Module | tei — Specification |
Used by | |
Members | model.lLike model.pLike[p] |
Note | Note that this element class does not include members of the model.inter class, which can appear either within or between paragraph-level items. |
12.2.11. model.divTop
model.divTop groups elements appearing at the beginning of a text division. [4.2. Elements Common to All Divisions] | |
Module | tei — Specification |
Used by | |
Members | model.divTopPart[model.headLike[head]] model.divWrapper |
12.2.12. model.divTopPart
model.divTopPart groups elements which can occur only at the beginning of a text division. [4.6. Title Pages] | |
Module | tei — Specification |
Used by | |
Members | model.headLike[head] |
12.2.13. model.emphLike
model.emphLike groups phrase-level elements which are typographically distinct and to which a specific function can be attributed. [3.3. Highlighting and Quotation] | |
Module | tei — Specification |
Used by | |
Members | gloss lbl term title |
12.2.14. model.encodingDescPart
model.encodingDescPart groups elements which may be used inside <encodingDesc> and appear multiple times. | |
Module | tei — Specification |
Used by | |
Members | appInfo charDecl classDecl editorialDecl projectDesc tagsDecl |
12.2.15. model.entryLike
model.entryLike groups elements structurally analogous to paragraphs within dictionaries. [9.1. Dictionary Body and Overall Structure 1.3. The TEI Class System] | |
Module | dictionaries — Specification |
Used by | |
Members | entry |
12.2.16. model.entryPart
12.2.17. model.entryPart.top
model.entryPart.top groups high level elements within a structured dictionary entry [9.2. The Structure of Dictionary Entries] | |
Module | tei — Specification |
Used by | |
Members | model.biblLike[bibl biblStruct listBibl] cit dictScrap entry etym form gramGrp lbl num usg xr |
Note | Members of this class typically contain related parts of a dictionary entry which form a coherent subdivision, for example a particular sense, homonym, etc. |
12.2.18. model.formPart
model.formPart groups elements allowed within a <form> element in a dictionary. [9.3.1. Information on Written and Spoken Forms] | |
Module | dictionaries — Specification |
Used by | |
Members | model.gramPart[model.lexicalRefinement[gramGrp lbl usg] model.morphLike[gram]] form hyph orth pron stress syll |
12.2.19. model.frontPart
model.frontPart groups elements which appear at the level of divisions within front or back matter. [7.1. Front and Back Matter ] | |
Module | tei — Specification |
Used by | |
Members | model.frontPart.drama listBibl |
12.2.20. model.gLike
model.gLike groups elements used to represent individual non-Unicode characters or glyphs. | |
Module | tei — Specification |
Used by | |
Members | g |
12.2.21. model.global
model.global groups elements which may appear at any point within a TEI text. [1.3. The TEI Class System] | |
Module | tei — Specification |
Used by | |
Members | model.global.edit model.global.meta model.milestoneLike model.noteLike[note] figure metamark |
12.2.22. model.gramPart
model.gramPart groups elements allowed within a <gramGrp> element in a dictionary. [9.3.2. Grammatical Information] | |
Module | dictionaries — Specification |
Used by | |
Members | model.lexicalRefinement[gramGrp lbl usg] model.morphLike[gram] |
12.2.23. model.graphicLike
model.graphicLike groups elements containing images, formulae, and similar objects. [3.10. Graphics and Other Non-textual Components] | |
Module | tei — Specification |
Used by | |
Members | graphic |
12.2.24. model.headLike
model.headLike groups elements used to provide a title or heading at the start of a text division. | |
Module | tei — Specification |
Used by | |
Members | head |
12.2.25. model.hiLike
model.hiLike groups phrase-level elements which are typographically distinct but to which no specific function can be attributed. [3.3. Highlighting and Quotation] | |
Module | tei — Specification |
Used by | |
Members | hi |
12.2.26. model.highlighted
model.highlighted groups phrase-level elements which are typographically distinct. [3.3. Highlighting and Quotation] | |
Module | tei — Specification |
Used by | |
Members | model.emphLike[gloss lbl term title] model.hiLike[hi] |
12.2.27. model.imprintPart
model.imprintPart groups the bibliographic elements which occur inside imprints. [3.12. Bibliographic Citations and References] | |
Module | tei — Specification |
Used by | |
Members | biblScope distributor pubPlace publisher |
12.2.28. model.inter
model.inter groups elements which can appear either within or between paragraph-like elements. [1.3. The TEI Class System] | |
Module | tei — Specification |
Used by | |
Members | model.attributable[model.quoteLike[cit quote xr]] model.biblLike[bibl biblStruct listBibl] model.egLike model.labelLike model.listLike[list] model.oddDecl model.stageLike |
12.2.29. model.lexicalInter
model.lexicalInter pared-down version of model.inter for use in dictionary elements | |
Module | derived-module-TEILex0 |
Used by | |
Members | model.attributable[model.quoteLike[cit quote xr]] model.biblLike[bibl biblStruct listBibl] |
12.2.30. model.lexicalPhrase
model.lexicalPhrase pared-down version of model.phrase for use in dictionary elements | |
Module | derived-module-TEILex0 |
Used by | |
Members | model.graphicLike[graphic] model.hiLike[hi] model.highlighted[model.emphLike[gloss lbl term title] model.hiLike[hi]] model.ptrLike[ref] model.segLike[c pc seg] lang |
12.2.31. model.lexicalRefinement
model.lexicalRefinement elements adding further precision to the lexico-grammatical information provided for a dictionary entry. | |
Module | dictionaries — Specification |
Used by | |
Members | gramGrp lbl usg |
12.2.32. model.limitedPhrase
model.limitedPhrase groups phrase-level elements excluding those elements primarily intended for transcription of existing sources. [1.3. The TEI Class System] | |
Module | tei — Specification |
Used by | |
Members | model.emphLike[gloss lbl term title] model.hiLike[hi] model.pPart.data[model.addressLike[affiliation email] model.dateLike[date] model.measureLike model.nameLike[model.nameLike.agent[name orgName persName] model.offsetLike model.persNamePart[forename surname] model.placeStateLike[model.placeNamePart[placeName]] idno lang]] model.pPart.editorial[abbr expan] model.pPart.msdesc model.phrase.xml model.ptrLike[ref] |
12.2.33. model.listLike
model.listLike groups list-like elements. [3.8. Lists] | |
Module | tei — Specification |
Used by | |
Members | list |
12.2.34. model.morphLike
model.morphLike groups elements which provide morphological information within a dictionary entry. [9.3. Top-level Constituents of Entries] | |
Module | dictionaries — Specification |
Used by | |
Members | gram |
12.2.35. model.nameLike
model.nameLike groups elements which name or refer to a person, place, or organization. | |
Module | tei — Specification |
Used by | |
Members | model.nameLike.agent[name orgName persName] model.offsetLike model.persNamePart[forename surname] model.placeStateLike[model.placeNamePart[placeName]] idno lang |
Note | A superset of the naming elements that may appear in datelines, addresses, statements of responsibility, etc. |
12.2.36. model.nameLike.agent
model.nameLike.agent groups elements which contain names of individuals or corporate bodies. [3.6. Names, Numbers, Dates, Abbreviations, and Addresses] | |
Module | tei — Specification |
Used by | |
Members | name orgName persName |
Note | This class is used in the content model of elements which reference names of people or organizations. |
12.2.37. model.noteLike
model.noteLike groups globally-available note-like elements. [3.9. Notes, Annotation, and Indexing] | |
Module | tei — Specification |
Used by | |
Members | note |
12.2.38. model.pLike
model.pLike groups paragraph-like elements. | |
Module | tei — Specification |
Used by | |
Members | p |
12.2.39. model.pLike.front
model.pLike.front groups paragraph-like elements which can occur as direct constituents of front matter. [4.6. Title Pages] | |
Module | tei — Specification |
Used by | |
Members | head |
12.2.40. model.pPart.data
model.pPart.data groups phrase-level elements containing names, dates, numbers, measures, and similar data. [3.6. Names, Numbers, Dates, Abbreviations, and Addresses] | |
Module | tei — Specification |
Used by | |
Members | model.addressLike[affiliation email] model.dateLike[date] model.measureLike model.nameLike[model.nameLike.agent[name orgName persName] model.offsetLike model.persNamePart[forename surname] model.placeStateLike[model.placeNamePart[placeName]] idno lang] |
12.2.41. model.pPart.edit
model.pPart.edit groups phrase-level elements for simple editorial correction and transcription. [3.5. Simple Editorial Changes] | |
Module | tei — Specification |
Used by | |
Members | model.pPart.editorial[abbr expan] model.pPart.transcriptional |
12.2.42. model.pPart.editorial
model.pPart.editorial groups phrase-level elements for simple editorial interventions that may be useful both in transcribing and in authoring. [3.5. Simple Editorial Changes] | |
Module | tei — Specification |
Used by | |
Members | abbr expan |
12.2.43. model.paraPart
12.2.44. model.persNamePart
model.persNamePart groups elements which form part of a personal name. [13.2.1. Personal Names] | |
Module | namesdates — Specification |
Used by | |
Members | forename surname |
12.2.45. model.phrase
model.phrase groups elements which can occur at the level of individual words or phrases. [1.3. The TEI Class System] | |
Module | tei — Specification |
Used by | |
Members | model.graphicLike[graphic] model.highlighted[model.emphLike[gloss lbl term title] model.hiLike[hi]] model.lPart model.pPart.data[model.addressLike[affiliation email] model.dateLike[date] model.measureLike model.nameLike[model.nameLike.agent[name orgName persName] model.offsetLike model.persNamePart[forename surname] model.placeStateLike[model.placeNamePart[placeName]] idno lang]] model.pPart.edit[model.pPart.editorial[abbr expan] model.pPart.transcriptional] model.pPart.msdesc model.phrase.xml model.ptrLike[ref] model.ptrLike.form model.segLike[c pc seg] model.specDescLike |
Note | This class of elements can occur within paragraphs, list items, lines of verse, etc. |
12.2.46. model.placeNamePart
model.placeNamePart groups elements which form part of a place name. [13.2.3. Place Names] | |
Module | tei — Specification |
Used by | |
Members | placeName |
12.2.47. model.placeStateLike
model.placeStateLike groups elements which describe changing states of a place. | |
Module | tei — Specification |
Used by | |
Members | model.placeNamePart[placeName] |
12.2.48. model.profileDescPart
model.profileDescPart groups elements which may be used inside <profileDesc> and appear multiple times. | |
Module | tei — Specification |
Used by | |
Members | langUsage |
12.2.49. model.ptrLike
model.ptrLike groups elements used for purposes of location and reference. [3.7. Simple Links and Cross-References] | |
Module | tei — Specification |
Used by | |
Members | ref |
12.2.50. model.publicationStmtPart.agency
model.publicationStmtPart.agency groups the child elements of a <publicationStmt> element of the TEI header that indicate an authorising agent. [2.2.4. Publication, Distribution, Licensing, etc.] | |
Module | tei — Specification |
Used by | |
Members | authority distributor publisher |
Note | The ‘agency’ child elements, while not required, are required if one of the ‘detail’ child elements is to be used. It is not valid to have a ‘detail’ child element without a preceding ‘agency’ child element. See also model.publicationStmtPart.detail. |
12.2.51. model.publicationStmtPart.detail
model.publicationStmtPart.detail groups the agency-specific child elements of the <publicationStmt> element of the TEI header. [2.2.4. Publication, Distribution, Licensing, etc.] | |
Module | tei — Specification |
Used by | |
Members | model.ptrLike[ref] availability date idno pubPlace |
Note | A ‘detail’ child element may not occur unless an ‘agency’ child element precedes it. See also model.publicationStmtPart.agency. |
12.2.52. model.quoteLike
model.quoteLike groups elements used to directly contain quotations. | |
Module | tei — Specification |
Used by | |
Members | cit quote xr |
12.2.53. model.resource
model.resource groups separate elements which constitute the content of a digital resource, as opposed to its metadata. [1.3. The TEI Class System] | |
Module | tei — Specification |
Used by | |
Members | text |
12.2.54. model.respLike
model.respLike groups elements which are used to indicate intellectual or other significant responsibility, for example within a bibliographic element. | |
Module | tei — Specification |
Used by | |
Members | author editor principal respStmt |
12.2.55. model.segLike
model.segLike groups elements used for arbitrary segmentation. [16.3. Blocks, Segments, and Anchors 17.1. Linguistic Segment Categories] | |
Module | tei — Specification |
Used by | |
Members | c pc seg |
Note | The principles on which segmentation is carried out, and any special codes or attribute values used, should be defined explicitly in the <segmentation> element of the <encodingDesc> within the associated TEI header. |
12.3. Attribute classes
12.3.1. att.anchoring
att.anchoring (anchoring) provides attributes for use on annotations, e.g. notes and groups of notes describing the existence and position of an anchor for annotations. | |||||||||||||||||||
Module | tei — Specification | ||||||||||||||||||
Members | note | ||||||||||||||||||
Attributes |
| ||||||||||||||||||
Example |
|
12.3.2. att.ascribed
att.ascribed provides attributes for elements representing speech or action that can be ascribed to a specific individual. [3.3.3. Quotation 8.3. Elements Unique to Spoken Texts] | |||||||||||
Module | tei — Specification | ||||||||||
Members | change | ||||||||||
Attributes |
|
12.3.3. att.cReferencing
att.cReferencing provides attributes that may be used to supply a canonical reference as a means of identifying the target of a pointer. | |||||||||
Module | tei — Specification | ||||||||
Members | gloss ref term | ||||||||
Attributes |
|
12.3.4. att.canonical
att.canonical provides attributes that can be used to associate a representation such as a name or title with canonical information about the object being named or referenced. [13.1.1. Linking Names and Their Referents] | |||||||||||||||||||||||
Module | tei — Specification | ||||||||||||||||||||||
Members | att.naming[att.personal[forename name orgName persName placeName surname] affiliation author editor pubPlace] authority catDesc date distributor principal publisher resp respStmt term title | ||||||||||||||||||||||
Attributes |
|
12.3.5. att.citing
att.citing provides attributes for specifying the specific part of a bibliographic item being cited. [1.3.1. Attribute Classes] | |||||||||||||||||||||
Module | tei — Specification | ||||||||||||||||||||
Members | biblScope citedRange | ||||||||||||||||||||
Attributes |
|
12.3.6. att.datable
att.datable provides attributes for normalization of elements that contain dates, times, or datable events. [3.6.4. Dates and Times 13.4. Dates] | |||||||||||||||||||||
Module | tei — Specification | ||||||||||||||||||||
Members | affiliation author change date editor idno licence name orgName persName placeName principal resp title | ||||||||||||||||||||
Attributes | att.datable.w3c (@when, @notBefore, @notAfter, @from, @to) att.datable.iso (@when-iso, @notBefore-iso, @notAfter-iso, @from-iso, @to-iso) att.datable.custom (@when-custom, @notBefore-custom, @notAfter-custom, @from-custom, @to-custom, @datingPoint, @datingMethod)
| ||||||||||||||||||||
Note | This ‘superclass’ provides attributes that can be used to provide normalized values of temporal information. By default, the attributes from the att.datable.w3c class are provided. If the module for names & dates is loaded, this class also provides attributes from the att.datable.iso and att.datable.custom classes. In general, the possible values of attributes restricted to the W3C datatypes form a subset of those values available via the ISO 8601 standard. However, the greater expressiveness of the ISO datatypes may not be needed, and there exists much greater software support for the W3C datatypes. |
12.3.7. att.datable.custom
att.datable.custom provides attributes for normalization of elements that contain datable events to a custom dating system (i.e. other than the Gregorian used by W3 and ISO). [13.4. Dates] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
Module | namesdates — Specification | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Members | att.datable[affiliation author change date editor idno licence name orgName persName placeName principal resp title] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Attributes |
|
12.3.8. att.datable.iso
att.datable.iso provides attributes for normalization of elements that contain datable events using the ISO 8601:2004 standard. [3.6.4. Dates and Times 13.4. Dates] | |||||||||||||||||||||||||||||||||||
Module | namesdates — Specification | ||||||||||||||||||||||||||||||||||
Members | att.datable[affiliation author change date editor idno licence name orgName persName placeName principal resp title] | ||||||||||||||||||||||||||||||||||
Attributes |
| ||||||||||||||||||||||||||||||||||
Note | The value of these attributes should be a normalized representation of the date, time, or combined date & time intended, in any of the standard formats specified by ISO 8601:2004, using the Gregorian calendar. If both when-iso and dur-iso are specified, the values should be interpreted as indicating a span of time by its starting time (or date) and duration. That is, indicates the same time period as
In providing a ‘regularized’ form, no claim is made that the form in the source text is incorrect; the regularized form is simply that chosen as the main form for purposes of unifying variant forms under a single heading. |
12.3.9. att.datable.w3c
att.datable.w3c provides attributes for normalization of elements that contain datable events conforming to the W3C XML Schema Part 2: Datatypes Second Edition. [3.6.4. Dates and Times 13.4. Dates] | |||||||||||||||||||||||||||||||||||||
Module | tei — Specification | ||||||||||||||||||||||||||||||||||||
Members | att.datable[affiliation author change date editor idno licence name orgName persName placeName principal resp title] orth pron | ||||||||||||||||||||||||||||||||||||
Attributes |
| ||||||||||||||||||||||||||||||||||||
Schematron |
| ||||||||||||||||||||||||||||||||||||
Schematron |
| ||||||||||||||||||||||||||||||||||||
Schematron |
| ||||||||||||||||||||||||||||||||||||
Example |
| ||||||||||||||||||||||||||||||||||||
Note | The value of these attributes should be a normalized representation of the date, time, or combined date & time intended, in any of the standard formats specified by XML Schema Part 2: Datatypes Second Edition, using the Gregorian calendar. The most commonly-encountered format for the date portion of a temporal attribute is Note that this format does not currently permit use of the value 0000 to represent the year 1 BCE; instead the value -0001 should be used. |
12.3.10. att.datcat
att.datcat provides attributes that are used to align XML elements or attributes with the appropriate Data Categories (DCs) defined by an external taxonomy, in this way establishing the identity of information containers and values, and providing means of interpreting them. [9.5.2. Lexical View 18.3. Other Atomic Feature Values] | |||||||||||||||||||
Module | tei — Specification | ||||||||||||||||||
Members | att.lexicographic[def etym form gram gramGrp hyph lang lbl orth pron ref sense syll usg xr] att.segLike[c pc seg] category tagUsage | ||||||||||||||||||
Attributes |
| ||||||||||||||||||
Example | The example below presents the TEI encoding of the name-value pair <part of speech, common noun> , where the name (key) ‘part of speech’ is abbreviated as ‘POS’, and the value, ‘common noun’ is symbolized by ‘NN’. The entire name-value pair is encoded by means of the element <f>. In TEI XML, that element acts as the container, labeled with the name attribute. Its contents may be complex or simple. In the case at hand, the content is the symbol ‘NN’.The datcat attribute relates the feature name (i.e., the key) to the data category ‘part of speech’, while the attribute valueDatcat relates the feature value to the data category common noun. Both these data categories should be defined in an external and preferably open reference taxonomy or ontology. ‘NN’ is the symbol for common noun used e.g. in the CLAWS-7 tagset defined by the University Centre for Computer Corpus Research on Language at the University of Lancaster. The very same data category used for tagging an early version of the British National Corpus, and coming from the BNC Basic (C5) tagset, uses the symbol ‘NN0’ (rather than ‘NN’). Making these values semantically interoperable would be extremely difficult without a human expert if they were not anchored in a single point of an established reference taxonomy of morphosyntactic data categories. In the case at hand, the string ‘http://hdl.handle.net/11459/CCR_C-1256_7ec6083c-23d4-224d-6f94-eecbe6861545’ is both a persistent identifier of the data category in question, as well as a pointer to a shared definition of common noun.While the symbols ‘NN’, ‘NN0’, and many others (often coming from languages other than English) are implicitly members of the container category ‘part of speech’, it is sometimes useful not to rely on such an implicit relationship but rather use an explicit identifier for that data category, to distinguish it from other morphosyntactic data categories, such as gender, tense, etc. For that purpose, the above example uses the datcat attribute to reference a definition of part of speech. The reference taxonomy in this example is the CLARIN Concept Registry.If the feature structure markup exemplified above is to be repeated many times in a single document, it is much more efficient to gather the persistent identifiers in a single place and to only reference them, implicitly or directly, from feature structure markup. The following example is much more concise than the one above and relies on the concepts of feature structure declaration and feature value library, discussed in chapter [[undefined FS]]. The assumption here is that the relevant feature values are collected in a place that the annotation document in question has access to — preferably, a single document per linguistic resource, for example an <fsdDecl> that is XIncluded as a sibling of <text> or a child of <encodingDesc>; a <taxonomy> available resource-wide (e.g., in a shared header) is also an option.The example below presents an <fvLib> element that collects the relevant feature values (most of them omitted). At the same time, this example shows one way of encoding a tagset, i.e., an established inventory of values of (in the case at hand) morphosyntactic categories. Note that these Guidelines do not prescribe a specific choice between datcat and valueDatcat in such cases. The former is the generic way of referencing a data category, whereas the latter is more specific, in that it references a data category that represents a value. The choice between them comes into play where a single element — or a tight element complex, such as the <f>/<symbol> complex illustrated above — make it necessary or useful to distinguish between the container data category and its value. | ||||||||||||||||||
Example | In the context of dictionaries designed with semantic interoperability in mind, the following example ensures that the <pos> element is interpreted as the same information container as in the case of the example of <f name="POS"> above. Efficiency of this type of interoperable markup demands that the references to the particular data categories should best be provided in a single place within the dictionary (or a single place within the project), rather than being repeated inside every entry. For the container elements, this can be achieved at the level of <tagUsage>, although here, the valueDatcat attribute should be used, because it is not the <tagUsage> element that is associated with the relevant data category, but rather the element <pos> (or <case>, etc.) that is described by <tagUsage>: Another possibility is to shorten the URIs by means of the <prefixDef> mechanism, as illustrated below: This mechanism creates implications that are not always wanted, among others, in the case at hand, suggesting that the identifiers ‘pos’ and ‘adj’ belong to a namespace associated with the CLARIN Concept Repository (CCR), whereas that is solely a shorthand mechanism whose scope is the current resource. Documenting this clearly in the header of the dictionary is therefore advised.Yet another possibility is to associate the information about the relationship between a TEI markup element and the data category that it is intended to model already at the level of modeling the dictionary resource, that is, at the level of the ODD, in <equiv> element that is a child of <elementSpec> or <attDef>. | ||||||||||||||||||
Example | The targetDatcat attribute is designed to be used in, e.g., feature structure declarations, and is analogous to the targetLang attribute of the att.pointing class, in that it describes the object that is being referenced, rather than the referencing object. Above, the <fDecl> uses targetDatcat, because if it were to use datcat, it would be asserting that it is an instance of the container data category part of speech, whereas it is not — it models a container (<f>) that encodes a part of speech. Note also that it is the <f> that is modeled above, not its values, which are used as direct references to data categories; hence the use of datcat in the <symbol> element. | ||||||||||||||||||
Note | The TEI Abstract Model can be expressed as a hierarchy of attribute-value matrices (AVMs) of various types and of various levels of complexity, nested or grouped in various ways. At the most abstract level, an AVM consists of an information container and the value (contents) of that container. A simple example of an XML serialization of such structures is, on the one hand, the opening and closing tags that delimit and name the container, and, on the other, the content enclosed by the two tags that constitues the value. An analogous example is an attribute name and the value of that attribute. In a TEI XML example of two equivalent serializations expressing the name-value pair The att.datcat class provides means of addressing the containers and their values, while at the same time providing a way to interpret them in the context of external taxonomies or ontologies. Aligning e.g. both the <pos> element and the pos attribute with the same value of an external reference point (i.e., an entry in an agreed taxonomy) affirms the identity of the concept serialised by both the element container and the attribute container, and optionally provides a definition of that concept (in the case at hand, the concept part of speech). The value of the att.datcat attributes should be a PID (persistent identifier) that points to a specific — and, ideally, shared — taxonomy or ontology. Among the resources that can, to a lesser or greater extent, be used as inventories of (more or less) standardized linguistic categories are the GOLD ontology, CLARIN CCR, OLiA, or TermWeb's DatCatInfo, and also the Universal Dependencies inventory, on the assumption that its URIs are going to persist. It is imaginable that a project may choose to address a local taxonomy store instead, but this risks losing the advantage of interchangeability with other projects. Historically, datcat and valueDatcat originate from the (the now obsolete) ISO 12620:2009 standard, describing the data model and procedures for a Data Category Registry (DCR). The current version of that standard, ISO 12620-1, does not standardize the serialization of pointers, merely mentioning the TEI att.datcat as an example. Note that no constraint prevents the occurrence of a combination of att.datcat attributes: the <fDecl> element, which is a natural bearer of the targetDatcat attribute, is an instance of a specific modeling element, and, in principle, could be semantically fixed by an appropriate reference taxonomy of modeling devices. |
12.3.11. att.declarable
att.declarable provides attributes for those elements in the TEI header which may be independently selected by means of the special purpose decls attribute. [15.3. Associating Contextual Information with a Text] | |||||||||
Module | tei — Specification | ||||||||
Members | availability bibl biblStruct editorialDecl listBibl projectDesc seriesStmt sourceDesc xenoData | ||||||||
Attributes |
| ||||||||
Note | The rules governing the association of declarable elements with individual parts of a TEI text are fully defined in chapter 15.3. Associating Contextual Information with a Text. Only one element of a particular type may have a default attribute with a value of true. |
12.3.12. att.dimensions
att.dimensions provides attributes for describing the size of physical objects. | |||||||||||||||||||||||||||||||||||||||
Module | tei — Specification | ||||||||||||||||||||||||||||||||||||||
Members | date | ||||||||||||||||||||||||||||||||||||||
Attributes | att.ranging (@atLeast, @atMost, @min, @max, @confidence)
|
12.3.13. att.docStatus
att.docStatus provides attributes for use on metadata elements describing the status of a document. | |||||||||
Module | tei — Specification | ||||||||
Members | bibl biblStruct change revisionDesc | ||||||||
Attributes |
| ||||||||
Example |
|
12.3.14. att.editLike
att.editLike provides attributes describing the nature of an encoded scholarly intervention or interpretation of any kind. [3.5. Simple Editorial Changes 10.3.1. Origination 13.3.2. The Person Element 11.3.1.1. Core Elements for Transcriptional Work] | |||||||||||||||||
Module | tei — Specification | ||||||||||||||||
Members | affiliation date expan name orgName persName placeName | ||||||||||||||||
Attributes |
| ||||||||||||||||
Note | The members of this attribute class are typically used to represent any kind of editorial intervention in a text, for example a correction or interpretation, or to date or localize manuscripts etc. Each pointer on the source (if present) corresponding to a witness or witness group should reference a bibliographic citation such as a <witness>, <msDesc>, or <bibl> element, or another external bibliographic citation, documenting the source concerned. |
12.3.15. att.fragmentable
att.fragmentable provides attributes for representing fragmentation of a structural element, typically as a consequence of some overlapping hierarchy. | |||||||||||
Module | tei — Specification | ||||||||||
Members | att.segLike[c pc seg] p | ||||||||||
Attributes |
| ||||||||||
Example |
|
12.3.16. att.gaijiProp
att.gaijiProp provides attributes for defining the properties of non-standard characters or glyphs. [5. Characters, Glyphs, and Writing Modes] | |||||||||||||||||||||
Module | gaiji — Specification | ||||||||||||||||||||
Members | localProp unicodeProp unihanProp | ||||||||||||||||||||
Attributes |
| ||||||||||||||||||||
Example | In this example a definition for the Unicode property Decomposition Mapping is provided.
| ||||||||||||||||||||
Note | All name-only attributes need an xs:boolean attribute value inside value. |
12.3.17. att.global
att.global provides attributes common to all elements in the TEI encoding scheme. [1.3.1.1. Global Attributes] | |||||||||||||||||||||||||||||||||||
Module | tei — Specification | ||||||||||||||||||||||||||||||||||
Members | TEI abbr affiliation analytic appInfo author authority availability back bibl biblScope biblStruct body c catDesc category change char charDecl cit citedRange classDecl date def dictScrap distributor div edition editionStmt editor editorialDecl email encodingDesc entry etym expan extent figDesc figure fileDesc forename form front g gloss glyph gram gramGrp graphic head hi hyph idno imprint item lang lbl licence list listBibl localProp mapping metamark monogr name namespace note notesStmt orgName orth p pc persName placeName principal profileDesc projectDesc pron pubPlace publicationStmt publisher quote ref rendition resp respStmt revisionDesc seg sense seriesStmt sourceDesc stress surname syll tagUsage tagsDecl taxonomy teiHeader term text title titleStmt unicodeProp unihanProp usg xenoData xr | ||||||||||||||||||||||||||||||||||
Attributes | att.global.rendition (@rend, @style, @rendition) att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select) att.global.analytic (@ana) att.global.facs (@facs) att.global.change (@change) att.global.responsibility (@cert, @resp) att.global.source (@source)
|
12.3.18. att.global.analytic
12.3.19. att.global.change
att.global.change provides attributes allowing its member elements to specify one or more states or revision campaigns with which they are associated. | |||||||
Module | transcr — Specification | ||||||
Members | att.global[TEI abbr affiliation analytic appInfo author authority availability back bibl biblScope biblStruct body c catDesc category change char charDecl cit citedRange classDecl date def dictScrap distributor div edition editionStmt editor editorialDecl email encodingDesc entry etym expan extent figDesc figure fileDesc forename form front g gloss glyph gram gramGrp graphic head hi hyph idno imprint item lang lbl licence list listBibl localProp mapping metamark monogr name namespace note notesStmt orgName orth p pc persName placeName principal profileDesc projectDesc pron pubPlace publicationStmt publisher quote ref rendition resp respStmt revisionDesc seg sense seriesStmt sourceDesc stress surname syll tagUsage tagsDecl taxonomy teiHeader term text title titleStmt unicodeProp unihanProp usg xenoData xr] | ||||||
Attributes |
|
12.3.20. att.global.facs
att.global.facs provides attributes used to express correspondence between an element and all or part of a facsimile image or surface. [11.1. Digital Facsimiles] | |||||||
Module | transcr — Specification | ||||||
Members | att.global[TEI abbr affiliation analytic appInfo author authority availability back bibl biblScope biblStruct body c catDesc category change char charDecl cit citedRange classDecl date def dictScrap distributor div edition editionStmt editor editorialDecl email encodingDesc entry etym expan extent figDesc figure fileDesc forename form front g gloss glyph gram gramGrp graphic head hi hyph idno imprint item lang lbl licence list listBibl localProp mapping metamark monogr name namespace note notesStmt orgName orth p pc persName placeName principal profileDesc projectDesc pron pubPlace publicationStmt publisher quote ref rendition resp respStmt revisionDesc seg sense seriesStmt sourceDesc stress surname syll tagUsage tagsDecl taxonomy teiHeader term text title titleStmt unicodeProp unihanProp usg xenoData xr] | ||||||
Attributes |
|
12.3.21. att.global.linking
att.global.linking provides a set of attributes for hypertextual linking. [16. Linking, Segmentation, and Alignment] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Module | linking — Specification | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Members | att.global[TEI abbr affiliation analytic appInfo author authority availability back bibl biblScope biblStruct body c catDesc category change char charDecl cit citedRange classDecl date def dictScrap distributor div edition editionStmt editor editorialDecl email encodingDesc entry etym expan extent figDesc figure fileDesc forename form front g gloss glyph gram gramGrp graphic head hi hyph idno imprint item lang lbl licence list listBibl localProp mapping metamark monogr name namespace note notesStmt orgName orth p pc persName placeName principal profileDesc projectDesc pron pubPlace publicationStmt publisher quote ref rendition resp respStmt revisionDesc seg sense seriesStmt sourceDesc stress surname syll tagUsage tagsDecl taxonomy teiHeader term text title titleStmt unicodeProp unihanProp usg xenoData xr] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Attributes |
|
12.3.22. att.global.rendition
att.global.rendition provides rendering attributes common to all elements in the TEI encoding scheme. [1.3.1.1.3. Rendition Indicators] | |||||||||||||||||||||||||||||||
Module | tei — Specification | ||||||||||||||||||||||||||||||
Members | att.global[TEI abbr affiliation analytic appInfo author authority availability back bibl biblScope biblStruct body c catDesc category change char charDecl cit citedRange classDecl date def dictScrap distributor div edition editionStmt editor editorialDecl email encodingDesc entry etym expan extent figDesc figure fileDesc forename form front g gloss glyph gram gramGrp graphic head hi hyph idno imprint item lang lbl licence list listBibl localProp mapping metamark monogr name namespace note notesStmt orgName orth p pc persName placeName principal profileDesc projectDesc pron pubPlace publicationStmt publisher quote ref rendition resp respStmt revisionDesc seg sense seriesStmt sourceDesc stress surname syll tagUsage tagsDecl taxonomy teiHeader term text title titleStmt unicodeProp unihanProp usg xenoData xr] | ||||||||||||||||||||||||||||||
Attributes |
|
12.3.23. att.global.responsibility
att.global.responsibility provides attributes indicating the agent responsible for some aspect of the text, the markup or something asserted by the markup, and the degree of certainty associated with it. [1.3.1.1.4. Sources, certainty, and responsibility 3.5. Simple Editorial Changes 11.3.2.2. Hand, Responsibility, and Certainty Attributes 17.3. Spans and Interpretations 13.1.1. Linking Names and Their Referents] | |||||||||||||||
Module | tei — Specification | ||||||||||||||
Members | att.global[TEI abbr affiliation analytic appInfo author authority availability back bibl biblScope biblStruct body c catDesc category change char charDecl cit citedRange classDecl date def dictScrap distributor div edition editionStmt editor editorialDecl email encodingDesc entry etym expan extent figDesc figure fileDesc forename form front g gloss glyph gram gramGrp graphic head hi hyph idno imprint item lang lbl licence list listBibl localProp mapping metamark monogr name namespace note notesStmt orgName orth p pc persName placeName principal profileDesc projectDesc pron pubPlace publicationStmt publisher quote ref rendition resp respStmt revisionDesc seg sense seriesStmt sourceDesc stress surname syll tagUsage tagsDecl taxonomy teiHeader term text title titleStmt unicodeProp unihanProp usg xenoData xr] | ||||||||||||||
Attributes |
| ||||||||||||||
Example |
| ||||||||||||||
Example |
|
12.3.24. att.global.source
att.global.source provides attributes used by elements to point to an external source. [1.3.1.1.4. Sources, certainty, and responsibility 3.3.3. Quotation 8.3.4. Writing] | |||||||||||
Module | tei — Specification | ||||||||||
Members | att.global[TEI abbr affiliation analytic appInfo author authority availability back bibl biblScope biblStruct body c catDesc category change char charDecl cit citedRange classDecl date def dictScrap distributor div edition editionStmt editor editorialDecl email encodingDesc entry etym expan extent figDesc figure fileDesc forename form front g gloss glyph gram gramGrp graphic head hi hyph idno imprint item lang lbl licence list listBibl localProp mapping metamark monogr name namespace note notesStmt orgName orth p pc persName placeName principal profileDesc projectDesc pron pubPlace publicationStmt publisher quote ref rendition resp respStmt revisionDesc seg sense seriesStmt sourceDesc stress surname syll tagUsage tagsDecl taxonomy teiHeader term text title titleStmt unicodeProp unihanProp usg xenoData xr] | ||||||||||
Attributes |
| ||||||||||
Example |
| ||||||||||
Example |
| ||||||||||
Example | Include in the schema an element named <p> available from the TEI P5 2.0.1 release. | ||||||||||
Example | Create a schema using components taken from the file mycompiledODD.xml. |
12.3.25. att.internetMedia
att.internetMedia provides attributes for specifying the type of a computer resource using a standard taxonomy. | |||||||
Module | tei — Specification | ||||||
Members | att.media[graphic] ref | ||||||
Attributes |
| ||||||
Example | In this example mimeType is used to indicate that the URL points to a TEI XML file encoded in UTF-8.
| ||||||
Note | This attribute class provides an attribute for describing a computer resource, typically available over the internet, using a value taken from a standard taxonomy. At present only a single taxonomy is supported, the Multipurpose Internet Mail Extensions (MIME) Media Type system. This typology of media types is defined by the Internet Engineering Task Force in RFC 2046. The list of types is maintained by the Internet Assigned Numbers Authority (IANA). The mimeType attribute must have a value taken from this list. |
12.3.26. att.lexicographic
att.lexicographic provides a set of attributes for specifying standard and normalized values, grammatical functions, alternate or equivalent forms, and information about composite parts. [9.2. The Structure of Dictionary Entries] | |||||||||||||||||||||||||||||||||||||||||
Module | dictionaries — Specification | ||||||||||||||||||||||||||||||||||||||||
Members | def etym form gram gramGrp hyph lang lbl orth pron ref sense syll usg xr | ||||||||||||||||||||||||||||||||||||||||
Attributes | att.datcat (@datcat, @valueDatcat, @targetDatcat) att.lexicographic.normalized (@norm, @orig)
|
12.3.27. att.lexicographic.normalized
att.lexicographic.normalized provides attributes for usage within word-level elements in the analysis module and within lexicographic microstructure in the dictionaries module. | |||||||||||||||||||||||||||||||
Module | analysis — Specification | ||||||||||||||||||||||||||||||
Members | att.lexicographic[def etym form gram gramGrp hyph lang lbl orth pron ref sense syll usg xr] att.linguistic[pc] | ||||||||||||||||||||||||||||||
Attributes |
| ||||||||||||||||||||||||||||||
Note | It needs to be stressed that the two attributes in this class are meant for strictly lexicographic and linguistic uses, and not for editorial interventions. For the latter, the mechanism based on <choice>, <orig>, and <reg> needs to be employed. |
12.3.28. att.linguistic
att.linguistic provides a set of attributes concerning linguistic features of tokens, for usage within token-level elements, specifically <w> and <pc> in the analysis module. [17.4.2. Lightweight Linguistic Annotation] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
Module | analysis — Specification | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Members | pc | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Attributes | att.lexicographic.normalized (@norm, @orig)
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
Note | These attributes make it possible to encode simple language corpora and to add a layer of linguistic information to any tokenized resource. See section 17.4.2. Lightweight Linguistic Annotation for discussion. |
12.3.29. att.media
att.media provides attributes for specifying display and related properties of external media. | |||||||||||||||||||
Module | tei — Specification | ||||||||||||||||||
Members | graphic | ||||||||||||||||||
Attributes | att.internetMedia (@mimeType)
|
12.3.30. att.naming
att.naming provides attributes common to elements which refer to named persons, places, organizations etc. [3.6.1. Referring Strings 13.3.6. Names and Nyms] | |||||||||||||||
Module | tei — Specification | ||||||||||||||
Members | att.personal[forename name orgName persName placeName surname] affiliation author editor pubPlace | ||||||||||||||
Attributes | att.canonical (@key, @ref)
|
12.3.31. att.notated
12.3.32. att.partials
att.partials provides attributes for describing the extent of lexical references for a dictionary term. | |||||||||||
Module | tei — Specification | ||||||||||
Members | orth pron | ||||||||||
Attributes |
|
12.3.33. att.personal
att.personal (attributes for components of names usually, but not necessarily, personal names) common attributes for those elements which form part of a name usually, but not necessarily, a personal name. [13.2.1. Personal Names] | |||||||||||||||
Module | tei — Specification | ||||||||||||||
Members | forename name orgName persName placeName surname | ||||||||||||||
Attributes | att.naming (@role, @nymRef) (att.canonical (@key, @ref))
|
12.3.34. att.placement
att.placement provides attributes for describing where on the source page or object a textual element appears. [3.5.3. Additions, Deletions, and Omissions 11.3.1.4. Additions and Deletions] | |||||||||||||
Module | tei — Specification | ||||||||||||
Members | figure head metamark note | ||||||||||||
Attributes |
|
12.3.35. att.pointing
att.pointing provides a set of attributes used by all elements which point to other elements by means of one or more URI references. [1.3.1.1.2. Language Indicators 3.7. Simple Links and Cross-References] | |||||||||||||||||||||||||||||||
Module | tei — Specification | ||||||||||||||||||||||||||||||
Members | citedRange gloss licence note ref term | ||||||||||||||||||||||||||||||
Attributes |
|
12.3.36. att.ranging
att.ranging provides attributes for describing numerical ranges. | |||||||||||||||||||||||||||||||
Module | tei — Specification | ||||||||||||||||||||||||||||||
Members | att.dimensions[date] | ||||||||||||||||||||||||||||||
Attributes |
| ||||||||||||||||||||||||||||||
Example |
| ||||||||||||||||||||||||||||||
Example |
|
12.3.37. att.resourced
att.resourced provides attributes by which a resource (such as an externally held media file) may be located. | |||||||
Module | tei — Specification | ||||||
Members | graphic | ||||||
Attributes |
|
12.3.38. att.scoped
att.scoped | |||||||
Module | derived-module-TEILex0 | ||||||
Members | ref | ||||||
Attributes |
|
12.3.39. att.segLike
att.segLike provides attributes for elements used for arbitrary segmentation. [16.3. Blocks, Segments, and Anchors 17.1. Linguistic Segment Categories] | |||||||||
Module | tei — Specification | ||||||||
Members | c pc seg | ||||||||
Attributes | att.datcat (@datcat, @valueDatcat, @targetDatcat) att.fragmentable (@part)
|
12.3.40. att.sortable
att.sortable provides attributes for elements in lists or groups that are sortable, but whose sorting key cannot be derived mechanically from the element content. [9.1. Dictionary Body and Overall Structure] | |||||||||||
Module | tei — Specification | ||||||||||
Members | bibl biblStruct entry idno item list listBibl term | ||||||||||
Attributes |
|
12.3.41. att.spanning
att.spanning provides attributes for elements which delimit a span of text by pointing mechanisms rather than by enclosing it. [11.3.1.4. Additions and Deletions 1.3.1. Attribute Classes] | |||||||||
Module | tei — Specification | ||||||||
Members | metamark | ||||||||
Attributes |
| ||||||||
Note | The span is defined as running in document order from the start of the content of the pointing element to the end of the content of the element pointed to by the spanTo attribute (if any). If no value is supplied for the attribute, the assumption is that the span is coextensive with the pointing element. If no content is present, the assumption is that the starting point of the span is immediately following the element itself. |
12.3.42. att.styleDef
att.styleDef provides attributes to specify the name of a formal definition language used to provide formatting or rendition information. | |||||||||||||||||||||
Module | tei — Specification | ||||||||||||||||||||
Members | rendition | ||||||||||||||||||||
Attributes |
|
12.3.43. att.typed
att.typed provides attributes that can be used to classify or subclassify elements in any way. [1.3.1. Attribute Classes 17.1.1. Words and Above 3.6.1. Referring Strings 3.7. Simple Links and Cross-References 3.6.5. Abbreviations and Their Expansions 3.13.1. Core Tags for Verse 7.2.5. Speech Contents 4.1.1. Un-numbered Divisions 4.1.2. Numbered Divisions 4.2.1. Headings and Trailers 4.4. Virtual Divisions 13.3.2.3. Personal Relationships 11.3.1.1. Core Elements for Transcriptional Work 16.1.1. Pointers and Links 16.3. Blocks, Segments, and Anchors 12.2. Linking the Apparatus to the Text 22.5.1.2. Defining Content Models: RELAX NG 8.3. Elements Unique to Spoken Texts 23.3.1.3. Modification of Attribute and Attribute Value Lists] | |||||||||||||||||||
Module | tei — Specification | ||||||||||||||||||
Members | TEI abbr affiliation bibl biblStruct c change cit date div etym figure forename form g gloss gram gramGrp graphic head idno lbl list listBibl mapping name note orgName orth pc persName placeName pron quote ref seg surname term text title usg xenoData xr | ||||||||||||||||||
Attributes |
| ||||||||||||||||||
Schematron |
| ||||||||||||||||||
Note | When appropriate, values from an established typology should be used. Alternatively a typology may be defined in the associated TEI header. If values are to be taken from a project-specific list, this should be defined using the <valList> element in the project-specific schema description, as described in 23.3.1.3. Modification of Attribute and Attribute Value Lists . |
12.3.44. att.written
att.written provides attributes to indicate the hand in which the content of an element was written in the source being transcribed. [1.3.1. Attribute Classes] | |||||||
Module | tei — Specification | ||||||
Members | div figure head hi note p seg text | ||||||
Attributes |
|
12.4. Macros
12.4.1. macro.lexicalParaContent
macro.lexicalParaContent | |
Module | derived-module-TEILex0 |
Used by | |
Content model |
|
Declaration |
|
12.4.2. macro.limitedContent
macro.limitedContent (paragraph content) defines the content of prose elements that are not used for transcription of extant materials. [1.3. The TEI Class System] | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
12.4.3. macro.paraContent
macro.paraContent (paragraph content) defines the content of paragraphs and similar elements. [1.3. The TEI Class System] | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
12.4.4. macro.phraseSeq
macro.phraseSeq (phrase sequence) defines a sequence of character data and phrase-level elements. [1.4.1. Standard Content Models] | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
12.4.5. macro.phraseSeq.limited
macro.phraseSeq.limited (limited phrase sequence) defines a sequence of character data and those phrase-level elements that are not typically used for transcribing extant documents. [1.4.1. Standard Content Models] | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
12.4.6. macro.specialPara
macro.specialPara ('special' paragraph content) defines the content model of elements such as notes or list items, which either contain a series of component-level elements or else have the same structure as a paragraph, containing a series of phrase-level and inter-level elements. [1.3. The TEI Class System] | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
12.4.7. macro.xtext
macro.xtext (extended text) defines a sequence of character data and gaiji elements. | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
12.5. Datatypes
12.5.1. teidata.certainty
teidata.certainty defines the range of attribute values expressing a degree of certainty. | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
Note | Certainty may be expressed by one of the predefined symbolic values high, medium, or low. The value unknown should be used in cases where the encoder does not wish to assert an opinion about the matter. |
12.5.2. teidata.count
teidata.count defines the range of attribute values used for a non-negative integer value used as a count. | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
Note | Any positive integer value or zero is permitted |
12.5.3. teidata.duration.iso
teidata.duration.iso defines the range of attribute values available for representation of a duration in time using ISO 8601 standard formats | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
Example |
|
Example |
|
Example |
|
Example |
|
Note | A duration is expressed as a sequence of number-letter pairs, preceded by the letter P; the letter gives the unit and may be Y (year), M (month), D (day), H (hour), M (minute), or S (second), in that order. The numbers are all unsigned integers, except for the last, which may have a decimal component (using either For complete details, see ISO 8601 Data elements and interchange formats — Information interchange — Representation of dates and times. |
12.5.4. teidata.duration.w3c
teidata.duration.w3c defines the range of attribute values available for representation of a duration in time using W3C datatypes. | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
Example |
|
Example |
|
Example |
|
Example |
|
Note | A duration is expressed as a sequence of number-letter pairs, preceded by the letter P; the letter gives the unit and may be Y (year), M (month), D (day), H (hour), M (minute), or S (second), in that order. The numbers are all unsigned integers, except for the For complete details, see the W3C specification. |
12.5.5. teidata.enumerated
teidata.enumerated defines the range of attribute values expressed as a single XML name taken from a list of documented possibilities. | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
Note | Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace. Typically, the list of documented possibilities will be provided (or exemplified) by a value list in the associated attribute specification, expressed with a <valList> element. |
12.5.6. teidata.language
teidata.language defines the range of attribute values used to identify a particular combination of human language and writing system. [6.1. Language Identification] | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
Note | The values for this attribute are language ‘tags’ as defined in BCP 47. Currently BCP 47 comprises RFC 5646 and RFC 4647; over time, other IETF documents may succeed these as the best current practice. A ‘language tag’, per BCP 47, is assembled from a sequence of components or subtags separated by the hyphen character (-, U+002D). The tag is made of the following subtags, in the following order. Every subtag except the first is optional. If present, each occurs only once, except the fourth and fifth components (variant and extension), which are repeatable.
There are two exceptions to the above format. First, there are language tags in the IANA registry that do not match the above syntax, but are present because they have been ‘grandfathered’ from previous specifications. Second, an entire language tag can consist of only a private use subtag. These tags start with Examples include
The W3C Internationalization Activity has published a useful introduction to BCP 47, Language tags in HTML and XML. |
12.5.7. teidata.name
teidata.name defines the range of attribute values expressed as an XML Name. | |
Module | tei — Specification |
Used by | Element:
|
Content model |
|
Declaration |
|
Note | Attributes using this datatype must contain a single word which follows the rules defining a legal XML name (see https://www.w3.org/TR/REC-xml/#dt-name): for example they cannot include whitespace or begin with digits. |
12.5.8. teidata.namespace
teidata.namespace defines the range of attribute values used to indicate XML namespaces as defined by the W3C Namespaces in XML Technical Recommendation. | |
Module | tei — Specification |
Used by | Element:
|
Content model |
|
Declaration |
|
Note | The range of syntactically valid values is defined by RFC 3986 Uniform Resource Identifier (URI): Generic Syntax |
12.5.9. teidata.numeric
teidata.numeric defines the range of attribute values used for numeric values. | |
Module | tei — Specification |
Used by | Element:
|
Content model |
|
Declaration |
|
Note | Any numeric value, represented as a decimal number, in floating point format, or as a ratio. To represent a floating point number, expressed in scientific notation, ‘E notation’, a variant of ‘exponential notation’, may be used. In this format, the value is expressed as two numbers separated by the letter E. The first number, the significand (sometimes called the mantissa) is given in decimal format, while the second is an integer. The value is obtained by multiplying the mantissa by 10 the number of times indicated by the integer. Thus the value represented in decimal notation as 1000.0 might be represented in scientific notation as 10E3. A value expressed as a ratio is represented by two integer values separated by a solidus (/) character. Thus, the value represented in decimal notation as 0.5 might be represented as a ratio by the string 1/2. |
12.5.10. teidata.outputMeasurement
teidata.outputMeasurement defines a range of values for use in specifying the size of an object that is intended for display. | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
Example |
|
Note | These values map directly onto the values used by XSL-FO and CSS. For definitions of the units see those specifications; at the time of this writing the most complete list is in the CSS3 working draft. |
12.5.11. teidata.pattern
teidata.pattern defines attribute values which are expressed as a regular expression. | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
Note | A regular expression, often called a pattern, is an expression that describes a set of strings. They are usually used to give a concise description of a set, without having to list all elements. For example, the set containing the three strings Handel, Händel, and Haendel can be described by the pattern WikipediaH(ä|ae?)ndel (or alternatively, it is said that the pattern H(ä|ae?)ndel matches each of the three strings)This TEI datatype is mapped to the XSD token datatype, and may therefore contain any string of characters. However, it is recommended that the value used conform to the particular flavour of regular expression syntax supported by XSD Schema. |
12.5.12. teidata.point
teidata.point defines the data type used to express a point in cartesian space. | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
Example |
|
Note | A point is defined by two numeric values, which should be expressed as decimal numbers. Neither number can end in a decimal point. E.g., both 0.0,84.2 and 0,84 are allowed, but 0.,84. is not. |
12.5.13. teidata.pointer
teidata.pointer defines the range of attribute values used to provide a single URI, absolute or relative, pointing to some other resource, either within the current document or elsewhere. | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
Note | The range of syntactically valid values is defined by RFC 3986 Uniform Resource Identifier (URI): Generic Syntax. Note that the values themselves are encoded using RFC 3987 Internationalized Resource Identifiers (IRIs) mapping to URIs. For example, |
12.5.14. teidata.probCert
teidata.probCert defines a range of attribute values which can be expressed either as a numeric probability or as a coded certainty value. | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
12.5.15. teidata.probability
teidata.probability defines the range of attribute values expressing a probability. | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
Note | Probability is expressed as a real number between 0 and 1; 0 representing certainly false and 1 representing certainly true. |
12.5.16. teidata.replacement
teidata.replacement defines attribute values which contain a replacement template. | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
12.5.17. teidata.temporal.iso
teidata.temporal.iso defines the range of attribute values expressing a temporal expression such as a date, a time, or a combination of them, that conform to the international standard Data elements and interchange formats – Information interchange – Representation of dates and times. | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
Note | If it is likely that the value used is to be compared with another, then a time zone indicator should always be included, and only the dateTime representation should be used. For all representations for which ISO 8601:2004 describes both a basic and an extended format, these Guidelines recommend use of the extended format. |
12.5.18. teidata.temporal.w3c
teidata.temporal.w3c defines the range of attribute values expressing a temporal expression such as a date, a time, or a combination of them, that conform to the W3C XML Schema Part 2: Datatypes Second Edition specification. | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
Note | If it is likely that the value used is to be compared with another, then a time zone indicator should always be included, and only the dateTime representation should be used. |
12.5.19. teidata.text
teidata.text defines the range of attribute values used to express some kind of identifying string as a single sequence of Unicode characters possibly including whitespace. | |
Module | tei — Specification |
Used by | Element:
|
Content model |
|
Declaration |
|
Note | Attributes using this datatype must contain a single ‘token’ in which whitespace and other punctuation characters are permitted. |
12.5.20. teidata.truthValue
teidata.truthValue defines the range of attribute values used to express a truth value. | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
Note | The possible values of this datatype are 1 or true, or 0 or false. This datatype applies only for cases where uncertainty is inappropriate; if the attribute concerned may have a value other than true or false, e.g. unknown, or inapplicable, it should have the extended version of this datatype: teidata.xTruthValue. |
12.5.21. teidata.version
teidata.version defines the range of attribute values which may be used to specify a TEI or Unicode version number. | |
Module | tei — Specification |
Used by | Element:
|
Content model |
|
Declaration |
|
Note | The value of this attribute follows the pattern specified by the Unicode consortium for its version number (http://unicode.org/versions/). A version number contains digits and fullstop characters only. The first number supplied identifies the major version number. A second and third number, for minor and sub-minor version numbers, may also be supplied. |
12.5.22. teidata.versionNumber
teidata.versionNumber defines the range of attribute values used for version numbers. | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
12.5.23. teidata.word
teidata.word defines the range of attribute values expressed as a single word or token. | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
Note | Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace. |
12.5.24. teidata.xTruthValue
teidata.xTruthValue (extended truth value) defines the range of attribute values used to express a truth value which may be unknown. | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
Note | In cases where where uncertainty is inappropriate, use the datatype teidata.TruthValue. |
12.5.25. teidata.xmlName
teidata.xmlName defines attribute values which contain an XML name. | |
Module | tei — Specification |
Used by | Element:
|
Content model |
|
Declaration |
|
Note | The rules defining an XML name form a part of the XML Specification. |
12.5.26. teidata.xpath
teidata.xpath defines attribute values which contain an XPath expression. | |
Module | tei — Specification |
Used by | |
Content model |
|
Declaration |
|
Note | Any XPath expression using the syntax defined in 6.2.. When writing programs that evaluate XPath expressions, programmers should be mindful of the possibility of malicious code injection attacks. For further information about XPath injection attacks, see the article at OWASP. |
13. Frequently Asked Questions
13.1. How do I start using TEI Lex-0 in my project?
To start using TEI Lex-0 in your own dictionary project, you need to set up your favorite XML editor to validate your dictionary against the TEI Lex-0 schema. This, you can do:
- in oXygen XML Editor, by associating an existing TEI document with the TEI Lex-0 schema's url:
https://raw.githubusercontent.com/DARIAH-ERIC/lexicalresources/master/Schemas/TEILex0/out/TEILex0.rng
using either menubar actions (Document > Schema > Associate Schema) or the red-pin icon in the oXygen menu bar. Both of these methods will display the Associate Schema dialog box: - manually by including the following xml processing instructions at the top of your TEI file:
<?xml version="1.0" encoding="UTF-8"?> <?xml-model href="https://raw.githubusercontent.com/DARIAH-ERIC/lexicalresources/master/Schemas/TEILex0/out/TEILex0.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?> <?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?> <TEI xmlns="http://www.tei-c.org/ns/1.0"> <!--etc.--> </TEI>
- by downloading the schema file from
https://raw.githubusercontent.com/DARIAH-ERIC/lexicalresources/master/Schemas/TEILex0/out/TEILex0.rng
and associating your xml file with it, using either of the above mentioned methods (see Figure 1).
Once you associate your dictionary file with the TEI Lex-0 schema, you can use your XML editor to validate it.
13.2. What should I do if I don't know how to encode something in TEI Lex-0?
TEI Lex-0 is a community-based project. If you have a question or need help encoding lexicographic data using TEI Lex-0, get in touch using our issue tracker here on GitHub.
13.3. How can I contribute to the development of TEI Lex-0?
More advanced users can propose solutions by submitting pull requests. Make sure you understand the internal nitty-gritty as well as our GitHub workflow.
13.3.1. The internal nitty-gritty
TEILex0.odd
is an index file: it uses a bunch of<xi:include>
pointers to individual "chapters" which live inTEILex0.parts
- examples of dictionary entries encoded in TEI Lex-0 live in a file called
examples.xml
inside the folderTEILex0.examples
examples.xml
validates against the TEI Lex-0 schema compiled inout/TEILex0.rng
stylesheets/tei-stripper.xsl
is used to strip the TEI examples file of the TEI namespace, replacing it with "http://www.tei-c.org/ns/Examples" so that they can be used directly inside <egXML> in our ODD file. For more info about why this is necessary, see https://github.com/BCDH/tei-strip-and-include.- to include validated examples, you can either point to the id of the element you want to include using the
xpointer()
scheme like this:<egXML xmlns="http://www.tei-c.org/ns/Examples"> <xi:include href="../TEILex0.examples/examples.stripped.xml" corresp="../TEILex0.examples/examples.xml" xpointer="pflaume"/> </egXML>
element()
scheme, you can also include segments:<egXML xmlns="http://www.tei-c.org/ns/Examples"> <xi:include href="../TEILex0.examples/examples.stripped.xml" corresp="../TEILex0.examples/examples.xml" xpointer="element(MZ.RGJS.сејче/4/1)"/> </egXML>
- If you are using oXygen XML, clicking on the link in Author Mode will take you directly to the element or fragment in the
examples.xml
for editing XML. - After making any changes to
examples.xml
, use thetei-stripper.xsl
(or the include TEI Stripper transformation scenario in oXygen) to produceexamples.stripped.xml
. Without this step, the examples in your ODD file will not validate.
13.3.2. GitHub Workflow
Before submitting your proposal to change something in the TEI Lex-0 specification or the narrative guidelines, make sure:
- you have received some feedback from the community using our GitHub issues
- you understand the internal nitty-gritty of how the TEI Lex-0 source files are organized and how the guidelines and the RNG schema is generated from ODD
To implement changes, make sure to follow our GitHub forklow:
- if you're starting for the first time, fork the lexical-resources repository; then clone your fork on your machine; the cloned fork is your so-called working copy; the original repository from which you made your clone is called "upstream"
- if you've forked and cloned the lexical-resources repository before, make sure the master branch in your working copy is up-to-date by fetching the latest changes and merging them into your working master branch from the upstream master
- create a new branch off your master branch; name it appropriately (e.g. fix-attr-values-on-sense)
- do the work (changing the specification, adding examples, or changing the narrative sections) in the specific branch you created for this particular issue
- commit and push your changes
- once you've finished implementing all the changes needed, create a pull-request
- if editors ask you to make additional changes, keep working in the same branch (i.e. fix-attr-values-on-sense); commit and push; your changes will be automatically added to your pull request
- once the editors accept your pull request, you can safely delete the branch from which you created your pull request (i.e. fix.attr-values-on-sense)
- once your pull request has been merged into the upstream master branch by the editors, you can bring the master branch in your working copy up to date by fetching and merging changes from upstream master; then pushing them to your remote repo
13.4. How can I convert dictionaries from TEI Lex-0 to Ontolex-Lemon?
Funny you should ask, because we have exactly what you're looking for. Check out the tei2ontolex stylesheet.