มอดูล:data consistency check/documentation
นี่คือหน้าเอกสารการใช้งานสำหรับ มอดูล:data consistency check
This module checks the validity and internal consistency of the language, language family, and script data used on Wiktionary: the modules in Category:Language data modules as well as Module:scripts/data.
Output[แก้ไข]
Discrepancies detected:
Module:etymology languages/code to canonical name
goh-lng
, the code for the canonical name Lombardic, is wrong; it should belng
.
Module:etymology languages/data
- The data key
alias_codes
for ??? (lng
) is invalid. - ภาษามคธ (
pra-mag
) has a canonical name that is not unique; it is also used by the codemag
. - The data key
preprocess_links
for ??? (th-new
) is invalid.
Module:families/canonical names
- The code
ira-mid
and the canonical name อิเรเนียนกลาง should be removed; they are not found in Module:families/data. - The code
ira-old
and the canonical name อิเรเนียนเก่า should be removed; they are not found in Module:families/data.
Module:families/code to canonical name
- The code
ira-mid
and the canonical name อิเรเนียนกลาง should be removed; they are not found in Module:families/data. - The code
ira-old
and the canonical name อิเรเนียนเก่า should be removed; they are not found in Module:families/data.
Module:families/data
- กลุ่มภาษาอินโด-อารยันเก่า (
inc-old
) has no child families or languages. - กลุ่มภาษาปรากริติก (
pra
) has no child families or languages.
Module:languages/data/2
- ภาษาอินุกติตุต (
iu
) hasoverride_translit
set, but no transliteration module - ภาษานอร์เวย์แบบบุ๊กมอล (
nb
) has ภาษาเดนมาร์ก (da
) set as an ancestor, but is not in the กลุ่มภาษาสแกนดิเนเวียนตะวันออก (gmq-eas
). - ภาษานอร์เวย์แบบบุ๊กมอล (
nb
) has ภาษานอร์เวย์กลาง (gmq-mno
) set as an ancestor, but is not in the กลุ่มภาษาสแกนดิเนเวียนตะวันตก (gmq-wes
).
Module:languages/data/3/h
- ภาษาฮินดูสตานีแบบแคริบเบียน (
hns
) has ภาษาโภชปุระ (bho
) set as an ancestor, but is not in the กลุ่มภาษาอินโด-อารยันตะวันออก (inc-eas
). - ภาษาฮินดูสตานีแบบแคริบเบียน (
hns
) has ภาษาอวัธ (awa
) set as an ancestor, but is not in the กลุ่มภาษาฮินดีตะวันออก (inc-hie
).
Module:languages/data/3/s
- ภาษาสันถาลี (
sat
) hasoverride_translit
set, but no transliteration module
Module:scripts/by name
- Chisoi (
Chis
) is missing - ดิเวส อกุรุ (
Diak
) is missing - Dhives Akuru, the canonical name for the code
Diak
, is wrong; it should be ดิเวส อกุรุ. - Garay (
Gara
) is missing - Khema (
Gukh
) is missing - Pahawh Hmong (
Hmng
) is missing - ม้ง, the canonical name for the code
Hmng
, is wrong; it should be Pahawh Hmong. IPAchar
, the code for the canonical name สัทอักษรสากล, is wrong; it should beIpach
.- กวิ (
Kawi
) is missing - Kawi, the canonical name for the code
Kawi
, is wrong; it should be กวิ. - Kirat Rai (
Krai
) is missing - มโร (
Mroo
) is missing - Mro, the canonical name for the code
Mroo
, is wrong; it should be มโร. - Ol Onal (
Onao
) is missing - อุสมาน (
Osma
) is missing - Osmanya, the canonical name for the code
Osma
, is wrong; it should be อุสมาน. - Sidetic (
Sidt
) is missing - Khudawadi, the canonical name for the code
Sind
, is wrong; it should be คุดาบาด. - คุดาบาด (
Sind
) is missing - Sunuwar (
Sunu
) is missing - Lai Tay (
Tayo
) is missing - Todhri (
Todr
) is missing - Tolong Siki (
Tols
) is missing - Tigalari (
Tutg
) is missing - วรังจิติ (
Wara
) is missing - Varang Kshiti, the canonical name for the code
Wara
, is wrong; it should be วรังจิติ. - Tamyig, the canonical name for the code
sit-tam-Tibt
, is wrong; it should be ตัมยิก. - The code
xzh-Tibt
and the canonical name Zhang-Zhung should be removed; they are not found in a submodule of Module:scripts.
Module:scripts/code to canonical name
Chis
(Chisoi) is missing- Dhives Akuru, the canonical name for the code
Diak
, is wrong; it should be ดิเวส อกุรุ. Gara
(Garay) is missingGukh
(Khema) is missing- ม้ง, the canonical name for the code
Hmng
, is wrong; it should be Pahawh Hmong. IPAchar
, the code for the canonical name สัทอักษรสากล, is wrong; it should beIpach
.- Kawi, the canonical name for the code
Kawi
, is wrong; it should be กวิ. Krai
(Kirat Rai) is missingLatnx
, the code for the canonical name ละติน, is wrong; it should beLatn
.- Mro, the canonical name for the code
Mroo
, is wrong; it should be มโร. Onao
(Ol Onal) is missing- Osmanya, the canonical name for the code
Osma
, is wrong; it should be อุสมาน. Sidt
(Sidetic) is missing- Khudawadi, the canonical name for the code
Sind
, is wrong; it should be คุดาบาด. Sunu
(Sunuwar) is missingTayo
(Lai Tay) is missingTodr
(Todhri) is missingTols
(Tolong Siki) is missingTutg
(Tigalari) is missing- Varang Kshiti, the canonical name for the code
Wara
, is wrong; it should be วรังจิติ. - Tamyig, the canonical name for the code
sit-tam-Tibt
, is wrong; it should be ตัมยิก. xka-Arab
, the code for the canonical name อาหรับ, is wrong; it should befa-Arab
.- The code
xzh-Tibt
and the canonical name Zhang-Zhung should be removed; they are not found in a submodule of Module:scripts.
Module:scripts/data
- อักษรBlissymbols (
Blis
) is not used by any language and has no characters listed for auto-detection. - อักษรCypro-Minoan (
Cpmn
) is not used by any language. - อักษรฮิรางานะ (
Hira
) is not used by any language. - อักษรคานะ (
Hrkt
) is not used by any language. - อักษรImage-rendered (
Imag
) is not used by any language and has no characters listed for auto-detection. - อักษรสัทอักษรสากล (
Ipach
) is not used by any language and has no characters listed for auto-detection. - อักษรMoon (
Moon
) is not used by any language and has no characters listed for auto-detection. - รหัสมอร์ส (
Morse
) is not used by any language and has no characters listed for auto-detection. - อักษรสัญกรณ์ดนตรี (
Music
) is not used by any language. - อักษรไม่ระบุ (
None
) is not used by any language and has no characters listed for auto-detection. - อักษรOl Onal (
Onao
) is not used by any language and has no characters listed for auto-detection. - อักษรRongorongo (
Roro
) is not used by any language and has no characters listed for auto-detection. - อักษรRumi numerals (
Rumin
) is not used by any language. - สัญญาณธง (
Semap
) is not used by any language and has no characters listed for auto-detection. - อักษรVisible Speech (
Visp
) is not used by any language and has no characters listed for auto-detection. - อักษรmathematical notation (
Zmth
) is not used by any language. - อักษรสัญลักษณ์ (
Zsym
) is not used by any language. - อักษรยังไม่กำหนด (
Zyyy
) is not used by any language and has no characters listed for auto-detection. - อักษรยังไม่มีรหัส (
Zzzz
) is not used by any language and has no characters listed for auto-detection. - The codes
fa-Arab
,ug-Arab
,ks-Arab
,ps-Arab
,ur-Arab
,tt-Arab
,ota-Arab
,mzn-Arab
,sd-Arab
andku-Arab
are currently alias codes. Only one code should be used in the data. - The codes
ms-Arab
andkk-Arab
are currently alias codes. Only one code should be used in the data. - The data key
sort_by_scraping
for อักษรญี่ปุ่น (Jpan
) is invalid.
Checks performed[แก้ไข]
For multiple data modules:
- Codes for languages, families and etymology-only languages must be unique and cannot clash with one another.
- Canonical names for languages, families, and etymology-only languages must not be found in the list of other names.
- Each name in the list of other names must appear only once.
otherNames
, if present, must be an array.- Wikidata item IDs must be a positive integer or a string starting with
Q
and ending with decimal digits.
The following must be true of the data used by Module:languages:
- Each code must be defined in the correct submodule according to whether it is two-letter, three-letter or exceptional.
- The canonical name (field
1
) must be present and must not be the same as the canonical name of another language. - If field
2
is notnil
, it must a valid Wikidata item ID. - If field
3
orfamily
is given and notnil
, it must be a valid family code. - If field
4
orscripts
is given and notnil
, it must be an array, and each string in the array must be a valid script code. - If
ancestors
is given, it must be an array, and each string in the array must be a valid language or etymology language code. - If
family
is given, it must be a valid family code. - If
type
is given, it must be one of the recognised values (regular
,reconstructed
,appendix-constructed
). - If
entry_name
is given, it must be a table that contains either two arrays (from
andto
) or a string (remove_diacritics
) or both. - If
sort_key
is given, it may either be a string, or at table that in turn contains either two arrays (from
andto
) or a string (remove_diacritics
). - If
entry_name
orsort_key
is given, thefrom
array must be longer or equal in length to theto
array. - If
standardChars
is given, it must form a valid Lua string pattern when placed between square brackets with^
before it ("[^...]
). (It should match all characters regularly used in the language, but that cannot be tested.) - If
override_translit
is set,translit
must also be set, because there must be a transliteration module that can override manual transliteration. - If
link_tr
is present, it must betrue
. - Have no data keys besides these:
1, 2, 3, "entry_name", "sort_key", "display", "otherNames", "aliases", "varieties", "type", "scripts", "ancestors", "wikimedia_codes", "wikipedia_article", "standardChars", "translit", "override_translit", "link_tr"
.
Checks not performed:
- If
translit
is present, it should be the name of a module, and this module should contain atr
function that takes a pagename (and optionally a language code and script code) as arguments. - If
sort_key
is a string, it should be the name of a module, and this module should contain amakeSortKey
function that takes a pagename (and optionally a language code and script code) as arguments. - If
entry_name
orsort_key
is a table and contains a fieldremove_diacritics
, the value of the field should be a string that forms a valid Lua pattern when it is placed inside negated set notation ([^...]
).
These are not checked here, because module errors will quickly crop up in entries if these conditions are not met, assuming that Module:utilities attempts to generate a sortkey for a category pertaining to the language in question, or full_link
attempts to use the transliteration module.
Module:languages/code to canonical name and Module:languages/canonical names must contain all the codes and canonical names found in the data submodules of Module:languages, and no more.
The following must be true of the data used by Module:etymology languages:
canonicalName
must be given.parent
must be given must be a valid language, family or etymology-only language code.- If
ancestors
is given, it must be an array, and each string in the array must be a valid language or etymology language code. The etymology language should also be listed as the ancestor of a regular language. - Have no data keys besides these:
"canonicalName", "otherNames", "parent", "ancestors", "wikipedia_article", "wikidata_item"
.
Codes in Module:families data must:
- Have
canonicalName
, which must not be the same as the canonical name of another family. - If
family
is given, it must be a valid family code. - Have at least one language or subfamily belonging to it.
- Have no data keys besides these:
"canonicalName", "otherNames", "family", "protoLanguage", "wikidata_item"
.
Codes in Module:scripts data must:
- Have
canonicalName
. - Have at least one language that lists it as one of its scripts.
- Have a
characters
pattern for script autodetection, and this must form a valid Lua string pattern when placed between square brackets ("[...]"
). (It should match all characters in the script, but that cannot be tested.) - Have no data keys besides these:
"canonicalName", "otherNames", "parent", "systems", "wikipedia_article", "characters", "direction"
.