std/unicode
Globals
const MaxRune
Maximum valid Unicode code point.
const ReplacementChar
Represents invalid code points.
const MaxAscii
Maximum Ascii value.
const MaxLatin1
Maximum Latin-1 value.
const UpperCase
const LowerCase
const TitleCase
const MaxCase
static Categories: map[str]&RangeTable
The set of Unicode category tables.
static CC: &RangeTable
The set of Unicode characters in category CC (Other, control).
static CF: &RangeTable
The set of Unicode characters in category CF (Other, format).
static CO: &RangeTable
The set of Unicode characters in category CO (Other, private use).
static CS: &RangeTable
The set of Unicode characters in category CS (Other, surrogate).
static Digit: &RangeTable
The set of Unicode characters with the "decimal Digit" property.
static ND: &RangeTable
The set of Unicode characters in category ND (Number, decimal Digit).
static Letter: &RangeTable
The set of Unicode letters, category L.
static L: &RangeTable
The set of Unicode letters, category L.
static LM: &RangeTable
The set of Unicode characters in category LM (Letter, modifier).
static LO: &RangeTable
The set of Unicode characters in category LO (Letter, other).
static Lower: &RangeTable
The set of Unicode Lower case letters.
static LL: &RangeTable
The set of Unicode characters in category LL (Letter, lowercase).
static Mark: &RangeTable
The set of Unicode Mark characters, category M.
static M: &RangeTable
The set of Unicode Mark characters, category M.
static MC: &RangeTable
The set of Unicode characters in category MC (Mark, spacing combining).
static ME: &RangeTable
The set of Unicode characters in category ME (Mark, enclosing).
static MN: &RangeTable
The set of Unicode characters in category MN (Mark, nonspacing).
static NL: &RangeTable
The set of Unicode characters in category NL (Number, Letter).
static NO: &RangeTable
The set of Unicode characters in category NO (Number, other).
static Number: &RangeTable
The set of Unicode Number characters, category N.
static N: &RangeTable
The set of Unicode Number characters, category N.
static Other: &RangeTable
The set of Unicode control and special characters, category C.
static C: &RangeTable
The set of Unicode control and special characters, category C.
static PC: &RangeTable
The set of Unicode characters in category PC (Punctuation, connector).
static PD: &RangeTable
The set of Unicode characters in category PD (Punctuation, Dash).
static PE: &RangeTable
The set of Unicode characters in category PE (Punctuation, close).
static PF: &RangeTable
The set of Unicode characters in category PF (Punctuation, final quote).
static PI: &RangeTable
The set of Unicode characters in category PI (Punctuation, initial quote).
static PO: &RangeTable
The set of Unicode characters in category PO (Punctuation, other).
static PS: &RangeTable
The set of Unicode characters in category PS (Punctuation, open).
static Punct: &RangeTable
The set of Unicode punctuation characters, category P.
static P: &RangeTable
The set of Unicode punctuation characters, category P.
static SC: &RangeTable
The set of Unicode characters in category SC (Symbol, currency).
static SK: &RangeTable
The set of Unicode characters in category SK (Symbol, modifier).
static SM: &RangeTable
The set of Unicode characters in category SM (Symbol, math).
static SO: &RangeTable
The set of Unicode characters in category SO (Symbol, other).
static Space: &RangeTable
The set of Unicode Space characters, category Z.
static Z: &RangeTable
The set of Unicode Space characters, category Z.
static Symbol: &RangeTable
The set of Unicode symbol characters, category S.
static S: &RangeTable
The set of Unicode symbol characters, category S.
static Title: &RangeTable
The set of Unicode Title case letters.
static LT: &RangeTable
The set of Unicode characters in category LT (Letter, TitleCase).
static Upper: &RangeTable
The set of Unicode Upper case letters.
static LU: &RangeTable
The set of Unicode characters in category LU (Letter, uppercase).
static ZL: &RangeTable
The set of Unicode characters in category ZL (Separator, line).
static ZP: &RangeTable
The set of Unicode characters in category ZP (Separator, paragraph).
static ZS: &RangeTable
The set of Unicode characters in category ZS (Separator, Space).
static Scripts: map[str]&RangeTable
The set of Unicode script tables.
static Adlam: &RangeTable
The set of Unicode characters in script Adlam.
static Ahom: &RangeTable
The set of Unicode characters in script Ahom.
static AnatolianHieroglyphs: &RangeTable
The set of Unicode characters in script AnatolianHieroglyphs.
static Arabic: &RangeTable
The set of Unicode characters in script Arabic.
static Armenian: &RangeTable
The set of Unicode characters in script Armenian.
static Avestan: &RangeTable
The set of Unicode characters in script Avestan.
static Balinese: &RangeTable
The set of Unicode characters in script Balinese.
static Bamum: &RangeTable
The set of Unicode characters in script Bamum.
static BassaVah: &RangeTable
The set of Unicode characters in script BassaVah.
static Batak: &RangeTable
The set of Unicode characters in script Batak.
static Bengali: &RangeTable
The set of Unicode characters in script Bengali.
static Bhaiksuki: &RangeTable
The set of Unicode characters in script Bhaiksuki.
static Bopomofo: &RangeTable
The set of Unicode characters in script Bopomofo.
static Brahmi: &RangeTable
The set of Unicode characters in script Brahmi.
static Braille: &RangeTable
The set of Unicode characters in script Braille.
static Buginese: &RangeTable
The set of Unicode characters in script Buginese.
static Buhid: &RangeTable
The set of Unicode characters in script Buhid.
static CanadianAboriginal: &RangeTable
The set of Unicode characters in script CanadianAboriginal.
static Carian: &RangeTable
The set of Unicode characters in script Carian.
static CaucasianAlbanian: &RangeTable
The set of Unicode characters in script CaucasianAlbanian.
static Chakma: &RangeTable
The set of Unicode characters in script Chakma.
static Cham: &RangeTable
The set of Unicode characters in script Cham.
static Cherokee: &RangeTable
The set of Unicode characters in script Cherokee.
static Chorasmian: &RangeTable
The set of Unicode characters in script Chorasmian.
static Common: &RangeTable
The set of Unicode characters in script Common.
static Coptic: &RangeTable
The set of Unicode characters in script Coptic.
static Cuneiform: &RangeTable
The set of Unicode characters in script Cuneiform.
static Cypriot: &RangeTable
The set of Unicode characters in script Cypriot.
static CyproMinoan: &RangeTable
The set of Unicode characters in script CyproMinoan.
static Cyrillic: &RangeTable
The set of Unicode characters in script Cyrillic.
static Deseret: &RangeTable
The set of Unicode characters in script Deseret.
static Devanagari: &RangeTable
The set of Unicode characters in script Devanagari.
static DivesAkuru: &RangeTable
The set of Unicode characters in script DivesAkuru.
static Dogra: &RangeTable
The set of Unicode characters in script Dogra.
static Duployan: &RangeTable
The set of Unicode characters in script Duployan.
static EgyptianHieroglyphs: &RangeTable
The set of Unicode characters in script EgyptianHieroglyphs.
static Elbasan: &RangeTable
The set of Unicode characters in script Elbasan.
static Elymaic: &RangeTable
The set of Unicode characters in script Elymaic.
static Ethiopic: &RangeTable
The set of Unicode characters in script Ethiopic.
static Georgian: &RangeTable
The set of Unicode characters in script Georgian.
static Glagolitic: &RangeTable
The set of Unicode characters in script Glagolitic.
static Gothic: &RangeTable
The set of Unicode characters in script Gothic.
static Grantha: &RangeTable
The set of Unicode characters in script Grantha.
static Greek: &RangeTable
The set of Unicode characters in script Greek.
static Gujarati: &RangeTable
The set of Unicode characters in script Gujarati.
static GunjalaGondi: &RangeTable
The set of Unicode characters in script GunjalaGondi.
static Gurmukhi: &RangeTable
The set of Unicode characters in script Gurmukhi.
static Han: &RangeTable
The set of Unicode characters in script Han.
static Hangul: &RangeTable
The set of Unicode characters in script Hangul.
static HanifiRohingya: &RangeTable
The set of Unicode characters in script HanifiRohingya.
static Hanunoo: &RangeTable
The set of Unicode characters in script Hanunoo.
static Hatran: &RangeTable
The set of Unicode characters in script Hatran.
static Hebrew: &RangeTable
The set of Unicode characters in script Hebrew.
static Hiragana: &RangeTable
The set of Unicode characters in script Hiragana.
static ImperialAramaic: &RangeTable
The set of Unicode characters in script ImperialAramaic.
static Inherited: &RangeTable
The set of Unicode characters in script Inherited.
static InscriptionalPahlavi: &RangeTable
The set of Unicode characters in script InscriptionalPahlavi.
static InscriptionalParthian: &RangeTable
The set of Unicode characters in script InscriptionalParthian.
static Javanese: &RangeTable
The set of Unicode characters in script Javanese.
static Kaithi: &RangeTable
The set of Unicode characters in script Kaithi.
static Kannada: &RangeTable
The set of Unicode characters in script Kannada.
static Katakana: &RangeTable
The set of Unicode characters in script Katakana.
static Kawi: &RangeTable
The set of Unicode characters in script Kawi.
static KayahLi: &RangeTable
The set of Unicode characters in script KayahLi.
static Kharoshthi: &RangeTable
The set of Unicode characters in script Kharoshthi.
static KhitanSmallScript: &RangeTable
The set of Unicode characters in script KhitanSmallScript.
static Khmer: &RangeTable
The set of Unicode characters in script Khmer.
static Khojki: &RangeTable
The set of Unicode characters in script Khojki.
static Khudawadi: &RangeTable
The set of Unicode characters in script Khudawadi.
static Lao: &RangeTable
The set of Unicode characters in script Lao.
static Latin: &RangeTable
The set of Unicode characters in script Latin.
static Lepcha: &RangeTable
The set of Unicode characters in script Lepcha.
static Limbu: &RangeTable
The set of Unicode characters in script Limbu.
static LinearA: &RangeTable
The set of Unicode characters in script LinearA.
static LinearB: &RangeTable
The set of Unicode characters in script LinearB.
static Lisu: &RangeTable
The set of Unicode characters in script Lisu.
static Lycian: &RangeTable
The set of Unicode characters in script Lycian.
static Lydian: &RangeTable
The set of Unicode characters in script Lydian.
static Mahajani: &RangeTable
The set of Unicode characters in script Mahajani.
static Makasar: &RangeTable
The set of Unicode characters in script Makasar.
static Malayalam: &RangeTable
The set of Unicode characters in script Malayalam.
static Mandaic: &RangeTable
The set of Unicode characters in script Mandaic.
static Manichaean: &RangeTable
The set of Unicode characters in script Manichaean.
static Marchen: &RangeTable
The set of Unicode characters in script Marchen.
static MasaramGondi: &RangeTable
The set of Unicode characters in script MasaramGondi.
static Medefaidrin: &RangeTable
The set of Unicode characters in script Medefaidrin.
static MeeteiMayek: &RangeTable
The set of Unicode characters in script MeeteiMayek.
static MendeKikakui: &RangeTable
The set of Unicode characters in script MendeKikakui.
static MeroiticCursive: &RangeTable
The set of Unicode characters in script MeroiticCursive.
static MeroiticHieroglyphs: &RangeTable
The set of Unicode characters in script MeroiticHieroglyphs.
static Miao: &RangeTable
The set of Unicode characters in script Miao.
static Modi: &RangeTable
The set of Unicode characters in script Modi.
static Mongolian: &RangeTable
The set of Unicode characters in script Mongolian.
static Mro: &RangeTable
The set of Unicode characters in script Mro.
static Multani: &RangeTable
The set of Unicode characters in script Multani.
static Myanmar: &RangeTable
The set of Unicode characters in script Myanmar.
static Nabataean: &RangeTable
The set of Unicode characters in script Nabataean.
static NagMundari: &RangeTable
The set of Unicode characters in script NagMundari.
static Nandinagari: &RangeTable
The set of Unicode characters in script Nandinagari.
static NewTaiLue: &RangeTable
The set of Unicode characters in script NewTaiLue.
static Newa: &RangeTable
The set of Unicode characters in script Newa.
static Nko: &RangeTable
The set of Unicode characters in script Nko.
static Nushu: &RangeTable
The set of Unicode characters in script Nushu.
static NyiakengPuachueHmong: &RangeTable
The set of Unicode characters in script NyiakengPuachueHmong.
static Ogham: &RangeTable
The set of Unicode characters in script Ogham.
static OlChiki: &RangeTable
The set of Unicode characters in script OlChiki.
static OldHungarian: &RangeTable
The set of Unicode characters in script OldHungarian.
static OldItalic: &RangeTable
The set of Unicode characters in script OldItalic.
static OldNorthArabian: &RangeTable
The set of Unicode characters in script OldNorthArabian.
static OldPermic: &RangeTable
The set of Unicode characters in script OldPermic.
static OldPersian: &RangeTable
The set of Unicode characters in script OldPersian.
static OldSogdian: &RangeTable
The set of Unicode characters in script OldSogdian.
static OldSouthArabian: &RangeTable
The set of Unicode characters in script OldSouthArabian.
static OldTurkic: &RangeTable
The set of Unicode characters in script OldTurkic.
static OldUyghur: &RangeTable
The set of Unicode characters in script OldUyghur.
static Oriya: &RangeTable
The set of Unicode characters in script Oriya.
static Osage: &RangeTable
The set of Unicode characters in script Osage.
static Osmanya: &RangeTable
The set of Unicode characters in script Osmanya.
static PahawhHmong: &RangeTable
The set of Unicode characters in script PahawhHmong.
static Palmyrene: &RangeTable
The set of Unicode characters in script Palmyrene.
static PauCinHau: &RangeTable
The set of Unicode characters in script PauCinHau.
static PhagsPa: &RangeTable
The set of Unicode characters in script PhagsPa.
static Phoenician: &RangeTable
The set of Unicode characters in script Phoenician.
static PsalterPahlavi: &RangeTable
The set of Unicode characters in script PsalterPahlavi.
static Rejang: &RangeTable
The set of Unicode characters in script Rejang.
static Runic: &RangeTable
The set of Unicode characters in script Runic.
static Samaritan: &RangeTable
The set of Unicode characters in script Samaritan.
static Saurashtra: &RangeTable
The set of Unicode characters in script Saurashtra.
static Sharada: &RangeTable
The set of Unicode characters in script Sharada.
static Shavian: &RangeTable
The set of Unicode characters in script Shavian.
static Siddham: &RangeTable
The set of Unicode characters in script Siddham.
static SignWriting: &RangeTable
The set of Unicode characters in script SignWriting.
static Sinhala: &RangeTable
The set of Unicode characters in script Sinhala.
static Sogdian: &RangeTable
The set of Unicode characters in script Sogdian.
static SoraSompeng: &RangeTable
The set of Unicode characters in script SoraSompeng.
static Soyombo: &RangeTable
The set of Unicode characters in script Soyombo.
static Sundanese: &RangeTable
The set of Unicode characters in script Sundanese.
static SylotiNagri: &RangeTable
The set of Unicode characters in script SylotiNagri.
static Syriac: &RangeTable
The set of Unicode characters in script Syriac.
static Tagalog: &RangeTable
The set of Unicode characters in script Tagalog.
static Tagbanwa: &RangeTable
The set of Unicode characters in script Tagbanwa.
static TaiLe: &RangeTable
The set of Unicode characters in script TaiLe.
static TaiTham: &RangeTable
The set of Unicode characters in script TaiTham.
static TaiViet: &RangeTable
The set of Unicode characters in script TaiViet.
static Takri: &RangeTable
The set of Unicode characters in script Takri.
static Tamil: &RangeTable
The set of Unicode characters in script Tamil.
static Tangsa: &RangeTable
The set of Unicode characters in script Tangsa.
static Tangut: &RangeTable
The set of Unicode characters in script Tangut.
static Telugu: &RangeTable
The set of Unicode characters in script Telugu.
static Thaana: &RangeTable
The set of Unicode characters in script Thaana.
static Thai: &RangeTable
The set of Unicode characters in script Thai.
static Tibetan: &RangeTable
The set of Unicode characters in script Tibetan.
static Tifinagh: &RangeTable
The set of Unicode characters in script Tifinagh.
static Tirhuta: &RangeTable
The set of Unicode characters in script Tirhuta.
static Toto: &RangeTable
The set of Unicode characters in script Toto.
static Ugaritic: &RangeTable
The set of Unicode characters in script Ugaritic.
static Vai: &RangeTable
The set of Unicode characters in script Vai.
static Vithkuqi: &RangeTable
The set of Unicode characters in script Vithkuqi.
static Wancho: &RangeTable
The set of Unicode characters in script Wancho.
static WarangCiti: &RangeTable
The set of Unicode characters in script WarangCiti.
static Yezidi: &RangeTable
The set of Unicode characters in script Yezidi.
static Yi: &RangeTable
The set of Unicode characters in script Yi.
static ZanabazarSquare: &RangeTable
The set of Unicode characters in script ZanabazarSquare.
static Properties: map[str]&RangeTable
The set of Unicode property tables.
static AsciiHexDigit: &RangeTable
static BidiControl: &RangeTable
static Dash: &RangeTable
static Deprecated: &RangeTable
static Diacritic: &RangeTable
static Extender: &RangeTable
static HexDigit: &RangeTable
static Hyphen: &RangeTable
static IdsBinaryOperator: &RangeTable
static IdsTrinaryOperator: &RangeTable
static Ideographic: &RangeTable
static JoinControl: &RangeTable
static LogicalOrderException: &RangeTable
static NoncharacterCodePoint: &RangeTable
static OtherAlphabetic: &RangeTable
static OtherDefaultIgnorableCodePoint: &RangeTable
static OtherGraphemeExtend: &RangeTable
static OtherIdContinue: &RangeTable
static OtherIdStart: &RangeTable
static OtherLowercase: &RangeTable
static OtherMath: &RangeTable
static OtherUppercase: &RangeTable
static PatternSyntax: &RangeTable
static PatternWhiteSpace: &RangeTable
static PrependedConcatenationMark: &RangeTable
static QuotationMark: &RangeTable
static Radical: &RangeTable
static RegionalIndicator: &RangeTable
static SentenceTerminal: &RangeTable
static SoftDotted: &RangeTable
static TerminalPunctuation: &RangeTable
static UnifiedIdeograph: &RangeTable
static VariationSelector: &RangeTable
static WhiteSpace: &RangeTable
static CaseRanges: []CaseRange
The table describing case mappings for all letters with non-self mappings.
static FoldCategory: map[str]&RangeTable
Maps a category name to a table of code points outside the category that are equivalent under simple case folding to code points inside the category. If there is NO entry for a category name, there are NO such points.
static FoldScript: map[str]&RangeTable
Maps a script name to a table of code points outside the script that are equivalent under simple case folding to code points inside the script. If there is NO entry for a script name, there are NO such points.
static GraphicRanges: []&RangeTable
Defines the set of graphic characters according to Unicode.
Functions
fn To(case: int, mut r: rune): rune
Maps the rune to the specified case: UpperCase, LowerCase, or TitleCase.
fn ToUpper(mut r: rune): rune
Maps the rune to upper case.
fn ToLower(mut r: rune): rune
Maps the rune to lower case.
fn Is(range_tab: &RangeTable, r: rune): bool
Reports whether the rune is in the specified table of ranges.
fn IsUpper(r: rune): bool
Reports whether the rune is an upper case letter.
fn IsLower(r: rune): bool
Reports whether the rune is a lower case letter.
fn IsDigit(r: rune): bool
Reports whether the rune is a decimal digit.
fn IsLetter(r: rune): bool
Reports whether the rune is a letter (category L).
fn IsNumber(r: rune): bool
Reports whether the rune is a number (category N).
fn IsPunct(r: rune): bool
Reports whether the rune is a Unicode punctuation character (category P).
fn IsSpace(r: rune): bool
Reports whether the rune is a space character as defined by Unicode's White Space property; in the Latin-1 space this is
'\t'
, '\n'
, '\v'
, '\f'
, '\r'
, ' '
, U+0085
(NEL), U+00A0
(NBSP).
Other definitions of spacing characters are set by category Z and property PatternWhiteSpace.
fn IsGraphic(r: rune): bool
Such characters include letters, marks, numbers punctuation, symbols, and spaces, from categories L, M, N, P, S, ZS.
fn IsIn(r: rune, ranges: ...&RangeTable): bool
Reports whether the rune is a member of one of the ranges.
fn SimpleFold(r: rune): rune
Iterates over Unicode code points equivalent under the Unicode-defined simple case folding. Among the code points equivalent to rune (including rune itself), SimpleFold returns the smallest rune > r if one exists, or else the smallest rune >= 0. If r is not a valid Unicode code point, SimpleFold(r) returns r.
For example:
SimpleFold('A') = 'a'
SimpleFold('a') = 'A'
SimpleFold('K') = 'k'
SimpleFold('k') = '\u212A' (Kelvin symbol, K)
SimpleFold('\u212A') = 'K'
SimpleFold('1') = '1'
SimpleFold(-2) = -2
Structs
struct Range16 {
lo: u16
hi: u16
stride: u16
}
Represents of a range of 16-bit Unicode code points. The range runs from lo to hi inclusive and has the specified stride.
struct Range32 {
lo: u16
hi: u16
stride: u16
}
Represents of a range of Unicode code points and is used when one or more of the values will not fit in 16 bits. The range runs from lo to hi inclusive and has the specified stride. lo and hi must always be >= 1<<16.
struct RangeTable {
r16: []Range16
r32: []Range32
LatinOffset: int
}
Defines a set of Unicode code points by listing the ranges of code points within the set. The ranges are listed in two slices to save space: a slice of 16-bit ranges and a slice of 32-bit ranges. The two slices must be in sorted order and non-overlapping. Also, r32 should contain only values >= 0x10000 (1<<16).