Skip to content

std/bufio

Type Aliases

jule
type SplitFunc: fn(mut data: []byte, atEOF: bool)!: (advance: int, token: []byte)

The signature of the split function used to tokenize the input. The arguments are an initial substring of the remaining unprocessed data and a flag, atEOF, that reports whether the [Reader] has no more data to give. The return values are the number of bytes to advance the input and the next token to return to the user, if any. It throws error as exceptional, if any.

Any exceptional scanning will stop and data may be lost. A successful read should always return successfully, any exceptional means it failed.

Otherwise, the [Scanner] advances the input. If the token is not nil, the [Scanner] returns it to the user. If the token is nil, the [Scanner]\ reads more data and continues scanning; if there is no more data--if atEOF was true--the [Scanner] returns. If the data does not yet hold a complete token, for instance if it has no newline while scanning lines, a [SplitFunc] can return (0, nil) to signal the [Scanner] to read more data into the slice and try again with a longer slice starting at the same point in the input.

The function is never called with an empty data slice unless atEOF is true. If atEOF is true, however, data may be non-empty and, as always, holds unprocessed text.

The data is a mutable copy into the relevant range of the Scanner's internal buffer. It is considered mutable because it is considered legal for this function to return a mutable slice from the relevant data. However, mutating the data is definitely not recommended. A safe [SplitFunc] should handle data without mutating it.

Globals

jule
const MaxTokenSize: untyped integer

The default maximum token size of the Scanner.

Functions

jule
fn SplitLines(mut data: []byte, atEOF: bool)!: (advance: int, token: []byte)

The split function for a [Scanner] that returns each line of text, stripped of any trailing end-of-line marker. The returned line may be empty. The end-of-line marker is one optional carriage return followed by one mandatory newline. In regular expression notation, it is \r?\n. The last non-empty line of input will be returned even if it has no newline.


jule
fn ScanBytes(mut data: []byte, atEOF: bool)!: (advance: int, token: []byte)

The split function for a [Scanner] that returns each byte as a token.


jule
fn ScanWords(mut data: []byte, atEOF: bool)!: (advance: int, token: []byte)

The split function for a [Scanner] that returns each space-separated word of text, with surrounding spaces deleted. It will never return an empty string. The definition of space is set by unicode::IsSpace.


jule
fn ScanRunes(mut data: []byte, atEOF: bool)!: (advance: int, token: []byte)

ScanRunes is a split function for a [Scanner] that returns each UTF-8-encoded rune as a token. The sequence of runes returned is equivalent to that from a range loop over the input as a string, which means that erroneous UTF-8 encodings translate to U+FFFD = "\xef\xbf\xbd". Because of the Scan interface, this makes it impossible for the client to distinguish correctly encoded replacement runes from encoding errors.

Structures

jule
struct ScannerCfg {
	Split: SplitFunc // The splitter function.
	Max:   int       // The maximum token size.
}

Scanner configuration data. All fields will assigned to the default value for the Scanner, for zero value.


jule
struct Scanner

Provides a convenient interface for reading data such as a file of newline-delimited lines of text. Successive calls to the [Scanner.Scan] method will step through the 'tokens' of a file, skipping the bytes between the tokens. The specification of a token is defined by a split function of type [SplitFunc]; the default split function breaks the input into lines with line termination stripped. The split functions are defined in this package for scanning a file into lines, bytes, UTF-8-encoded runes, and space-delimited words. The client may instead provide a custom split function.

Scanning stops unrecoverably at EOF, the first I/O error, or a token too large to fit in the buffer. When a scan stops, the reader may have advanced arbitrarily far past the last token.

Methods:

static fn New(mut r: io::Reader): &Scanner
Returns new Scanner for r with the default configuration.

static fn NewCfg(mut r: io::Reader, cfg: ScannerCfg): &Scanner
Returns new Scanner for r by the custom configuration.

fn Token(mut self): []byte
Returns the most recent token generated by a call to [Scanner.Scan]. The underlying array may point to data that will be overwritten by a subsequent call to Scan. It does no allocation.

fn Text(self): str
Returns the most recent token generated by a call to [Scanner.Scan] as a newly allocated string holding its bytes.

fn Scan(mut self)!: bool
Advances the [Scanner] to the next token, which will then be available through the [Scanner.Token] or [Scanner.Text] method. It returns false when there are no more tokens, either by reaching the end of the input or an exceptional. After Scan returns false, without any exceptional, it means EOF. Any exceptional will be forwarded. Scan panics if the split function returns too many empty tokens without advancing the input. This is a common error mode for scanners.

Enums

jule
enum ScanError

Error codes for scanner.

Fields:

  • TooLong: Token size exceeds the maximum token size limit.
  • BadReadCount: The reader returned invalid read count.
  • NegativeAdvance: Advance value of the splitter function was negative.
  • AdvanceTooFar: Advance value of the splitter function was too far.
  • NoProgress: No progress for the scanning.