[elm-discuss] Re: elm-tools/parser: how to access intermediate parsed objects downstream in parsing pipeline?

Yosuke Torii

2017-08-02 17:49:47 UTC

You cannot do like that in the middle of pipeline, but instead you can use `
andThen
<http://package.elm-lang.org/packages/elm-tools/parser/2.0.1/Parser#andThen>` to
make a new parser based on the previous value you parsed.

For example, you can pass `states` value to parse `acceptStates` like this.
(Simplified a lot, not tested)

dfaParser : Parser.Parser DFA
dfaParser =
statesParser
|> andThen (\states ->
succeed (DFA states)
|= alphabetParser
|= acceptStateParser states
|= ...
)

You can also make a recursive parser and pass the intermediate state to the
later parser.

deltaListParser : Context -> Parser.Parser (List Delta)
deltaListParser context =
oneOf
[ deltaParser
|> andThen (\delta ->
if checkDuplication delta context then
deltaListParser (updateContext delta context)
|> map (\rest -> delta :: rest)
else
fail "found duplicated values"
, succeed []
]

deltaParser : Parser.Parser Delta

That said, I don't think validation is necessary during the parsing
process. You can check it after everything is parsed. That is much simpler.

The elm-tools/parser documentation recommends using parsing pipelines such
as
type alias Point = { x : Float, y : Float}
point : Parser Pointpoint =
succeed Point
|. symbol "("
|. spaces
|= float
|. spaces
|. symbol ","
|. spaces
|= float
|. spaces
|. symbol ")"
spaces : Parser ()spaces =
ignore zeroOrMore (\c -> c == ' ')
I am parsing text in this way, but it is much longer than just two floats.
The high-level parser parses text with five major parts in order
(describing portions of a finite automaton) and it looks like this (and
type alias State = String
type alias DFATransition = State -> Char -> Result String State
type alias DFA =
{ states : List State
, inputAlphabet : List Char
, startState : State
, acceptStates : List State
, delta : DFATransition
}
dfaParser : Parser.Parser DFA
dfaParser =
Parser.succeed DFA
|. spaces
|. Parser.keyword "states:"
|. spaces
|= statesParser
|. spaces
|. Parser.keyword "input_alphabet:"
|. spaces
|= alphabetParser
|. spaces
|. Parser.keyword "start_state:"
|. spaces
|= startStateParser
|. spaces
|. Parser.keyword "accept_states:"
|. spaces
|= statesParser
|. spaces
|. Parser.keyword "delta:"
|. spaces
|= deltaParser
|. spaces
to parse text such as, for instance,
"""
states: {q,q0,q00,q000}
input_alphabet: {0,1}
start_state: q
accept_states: {q,q0,q00}
q,1 -> q
q0,1 -> q
q00,1 -> q
q000,1 -> q
q,0 -> q0
q0,0 -> q00
q00,0 -> q000
q000,0 -> q000
"""
Here's what I want to do: insert code in the middle of the pipeline that
can reference the data that has been parsed so far.
dfaParser =
Parser.succeed DFA
|. spaces
|. Parser.keyword "states:"
|. spaces
|= statesParser
|. spaces
|. Parser.keyword "input_alphabet:"
|. spaces
|= alphabetParser
...
then the data for states and alphabet have been successfully parsed into
two Lists. I would like to access those lists by name, later down the
pipeline.
One reason is that I would like to pass those lists as input to subsequent
parsers (startStateParser, acceptStatesParser, and deltaParser), to help
them do error-checking.
For example, the next thing parsed is a String parsed by startStateParser,
and I want to ensure that the parsed String is an element of the List
parsed by statesParser. But at the time I put the line |= startStateParser
in the pipeline, the parsed result of statesParser doesn't have a name
that I can refer to.
Another reason is that I want to do error-checking in the middle of a
pipeline. For example, my implementation of deltaParser reads the lines
such as "q,0 -> q0" and "q0,1 -> q" one at a time, and I would like to
access data parsed by previous lines when looking for errors on the current
line. (For example, it is an error to have duplicates on the left side of
-> such as the line "q,1 -> q" followed later by "q,1 -> q0", but to
indicate this error and reference the correct line number, I need access to
the lines parsed so far as I am processing the line with the error.)
I get the feeling that perhaps I'm structuring this incorrectly, so I
welcome advice on a better way to structure the parser.

--
You received this message because you are subscribed to the Google Groups "Elm Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elm-discuss+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.