Parser: Difference between revisions

← Older edit

Parser (view source)

Revision as of 18:36, 4 January 2024

114 bytes added , 18:36, 4 January 2024

m

→‎Grammar formats: a

Laqme

Bureaucrats, Interface administrators, Suppressors, Administrators

725

edits

Revision as of 18:29, 4 January 2024 (view source) Laqme (talk \| contribs) m (ramble about greedy PEG parsers) ← Older edit		Latest revision as of 18:36, 4 January 2024 (view source) Laqme (talk \| contribs) m (→‎Grammar formats: a)
Line 13:		Line 13:
	One problem with PEG grammars is that they resolve ambiguities "greedily": if there would be multiple parses for a sentence, a PEG parser returns only the one it finds first. In this way, a language described by a PEG grammar is automatically [[monoparsing]], but in an obscure and unsatisfying way. Alternative parses are ruled out not through linguistic rules, but by implementation details of the PEG software, such as which branches it tries first or the order in which rules are specified in the .peg file.		One problem with PEG grammars is that they resolve ambiguities "greedily": if there would be multiple parses for a sentence, a PEG parser returns only the one it finds first. In this way, a language described by a PEG grammar is automatically [[monoparsing]], but in an obscure and unsatisfying way. Alternative parses are ruled out not through linguistic rules, but by implementation details of the PEG software, such as which branches it tries first or the order in which rules are specified in the .peg file.

	[[Kuna]] uses an [https://en.wikipedia.org/wiki/Earley_parser Earley parser] to describe a context-free approximation of surface-level Toaq. The multiple parses are then fixed up or rejected using TypeScript code. Once the implementation is completely correct, this will always result in either one parse or no parse.		[[Kuna]] uses an [https://en.wikipedia.org/wiki/Earley_parser Earley parser] to describe a context-free approximation of surface-level Toaq. The multiple parses are then fixed up or rejected using TypeScript code. Once the implementation is completely correct, this will always result in either one parse or no parse. If there are multiple parses, it tells us that there is an ambiguity in our description of Toaq or a bug in Kuna.