UTF-8
The utf8.hpp
header of the API provides functionality for UTF-8 encoding.
Variables
constexpr signed int UTF8_RUNE_ERROR;
The "error" rune or "Unicode replacement character".
constexpr signed int UTF8_LOCB;
The default lowest continuation byte.
constexpr signed int UTF8_HICB;
The default highest continuation byte.
constexpr signed int UTF8_MAX_RUNE;
Maximum valid Unicode code point.
constexpr signed int UTF8_SURROGATE_MAX;
constexpr signed int UTF8_SURROGATE_MIN;
Code points in the surrogate range are not valid for UTF-8.
Functions
std::string runes_to_utf8(const std::vector<jule::I32> &s) noexcept;
Returns string from UTF-8 bytes.
std::tuple<jule::I32, jule::Int>
utf8_decode_rune_str(const char *s, const jule::Int &len);
Unpacks the first UTF-8 encoding in s and returns the rune and its width in string. If s is empty it returns (RUNE_ERROR, 0). Otherwise, if the encoding is invalid, it returns (RUNE_ERROR, 1). Both are impossible results for correct, non-empty UTF-8.
An encoding is invalid if it is incorrect UTF-8, encodes a rune that is out of range, or is not the shortest possible UTF-8 encoding for the value. No other validation is performed.
template<typename Dest>
void utf8_push_rune_bytes(const jule::I32 &r, Dest &dest);
Pushes UTF-8 encoding bytes of the rune to the destination. If the rune is out of range, it writes the encoding of RUNE_ERROR. It returns bytes of rune. The destination should have the push_back
method.