[elm-discuss] The way forward for Elm Arrays

Discussion:

Robin Heggelund Hansen

2017-06-11 23:15:59 UTC

I've been doing some thinking on how to improve Arrays in Elm. The result
of that thinking have been written down in this document:

https://docs.google.com/document/d/1Z8IC5qk98ISQLP_xKXNOHScCfzkQ8jCVSiYd1SxObB4/edit?usp=sharing

The document is quite large, but most of it are benchmark results comparing
Arrays and Lists.

There are some questions at the bottom of the document that I would love to
get feedback on.

--
You received this message because you are subscribed to the Google Groups "Elm Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elm-discuss+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Evan

2017-06-12 05:08:50 UTC

Permalink

Very interesting, thank you for sharing!

I wanted to add a couple notes and ideas that I've had separately.

Syntax

I was formerly somewhat keen on having special syntax for other
collections. For example, OCaml allows you to say [| 1, 2, 3 |] to create
an array. Hassan made a really nice proposal
<https://github.com/elm-lang/elm-plans/issues/12> to have it be #[ 1, 2, 3
] back in 2015. Since then I realized you can do the following:

a = Array.fromList
s = Set.fromList
d = Dict.fromList

array =
(a[ 1, 2, 3 ])

set =
(s[ 1, 2, 3 ])

dict =
(d[ 1 => "Tom", 2 => "Sue", 3 => "Jane" ])

(=>) = (,)

This is 1 or 2 characters off all the proposals out there, and it is
visually much nicer in my opinion.

With that knowledge, the case for special syntax seems significantly
weaker. I can also imagine detecting when a list is converted to something
else and doing something more clever in the compiler. That path seems
better overall to me.

List performance of map, foldl, and foldr

I shared this blog post <https://blogs.janestreet.com/optimizing-list-map/> about
optimizing List.map in OCaml a while ago, and Fred did some excellent work
on this <https://github.com/elm-lang/core/pull/707>! I have not had a
chance to fully assess the impact on generated code size, so I have not
merged it in yet. That trick can likely be very helpful for foldl and
foldr, but no one has looked into it.

I suspect there are many creative things we can do if some focused energy
was applied to lists. I have not taken a comprehensive look at this after
Elm got TCO, and I suspect we can do better!

Alternate Implementations of List

Another thing to consider before switching to some other data structure is
that we can change the performance profile of List using various techniques
behind the scenes. For example:

- Skip Lists <https://en.wikipedia.org/wiki/Skip_list>
- Finger Trees <https://en.wikipedia.org/wiki/Finger_tree> that maybe do
some sort of "compression" on older leaves
- I believe there's one that immutable.js does for their lists? Richard
told me about something like this. Where "old" nodes grow exponentially in
size so the nodes are 1, 2, 4, 8, etc. so it is faster to hop around and
you get better locality. Do you recall what this was called Richard?

I also think that manually recursing through lists with a case is
relatively common, at least in my experience, so I think it'd be good to
explore concrete bottlenecks in Elm code and see if there are not creative
things we can do to improve the performance profile of List without
changing it completely.

Anyway, exciting to see your work as always! And hopefully these ideas are
helpful!
Evan

Post by Robin Heggelund Hansen
I've been doing some thinking on how to improve Arrays in Elm. The result
https://docs.google.com/document/d/1Z8IC5qk98ISQLP_xKXNOHScCfzkQ8jCVSiYd1SxObB4/edit?usp=sharing
The document is quite large, but most of it are benchmark results
comparing Arrays and Lists.
There are some questions at the bottom of the document that I would love
to get feedback on.

Evan

2017-06-12 05:24:10 UTC

Permalink

Oh, and a few more notes. I recently did two pull requests on a framework
comparison benchmark to improve the Elm implementation:

- Start using lazy
<https://github.com/krausest/js-framework-benchmark/pull/192> - This
made the Elm version significantly faster than before.
- Get rid of List => Array => List conversion
<https://github.com/krausest/js-framework-benchmark/pull/194> - This
actually had *no significant impact* on the results. With random noise,
it actually got slower in the run they chose. Touching the DOM is so
expensive that it tends to overshadow any other concern, and I had the same
experience converting TodoMVC to use a Dict rather than List.

In looking for these, I also found this PR
<https://github.com/krausest/js-framework-benchmark/pull/195> that does
another optimization which should help a bit. The fact that you can now
remove without changing event handlers in all subsequent entries should
provide a decent speed boost, but we'll see when the results are out!

This is all to say: it is important to contextualize these ideas in the
performance characteristics of real programs, and those characteristics can
be quite surprising given how the DOM works.

Post by Evan
Very interesting, thank you for sharing!
I wanted to add a couple notes and ideas that I've had separately.
Syntax
I was formerly somewhat keen on having special syntax for other
collections. For example, OCaml allows you to say [| 1, 2, 3 |] to create
an array. Hassan made a really nice proposal
<https://github.com/elm-lang/elm-plans/issues/12> to have it be #[ 1, 2,
a = Array.fromList
s = Set.fromList
d = Dict.fromList
array =
(a[ 1, 2, 3 ])
set =
(s[ 1, 2, 3 ])
dict =
(d[ 1 => "Tom", 2 => "Sue", 3 => "Jane" ])
(=>) = (,)
This is 1 or 2 characters off all the proposals out there, and it is
visually much nicer in my opinion.
With that knowledge, the case for special syntax seems significantly
weaker. I can also imagine detecting when a list is converted to something
else and doing something more clever in the compiler. That path seems
better overall to me.
List performance of map, foldl, and foldr
I shared this blog post
<https://blogs.janestreet.com/optimizing-list-map/> about optimizing
List.map in OCaml a while ago, and Fred did some excellent work on this
<https://github.com/elm-lang/core/pull/707>! I have not had a chance to
fully assess the impact on generated code size, so I have not merged it in
yet. That trick can likely be very helpful for foldl and foldr, but no one
has looked into it.
I suspect there are many creative things we can do if some focused energy
was applied to lists. I have not taken a comprehensive look at this after
Elm got TCO, and I suspect we can do better!
Alternate Implementations of List
Another thing to consider before switching to some other data structure is
that we can change the performance profile of List using various techniques
- Skip Lists <https://en.wikipedia.org/wiki/Skip_list>
- Finger Trees <https://en.wikipedia.org/wiki/Finger_tree> that maybe
do some sort of "compression" on older leaves
- I believe there's one that immutable.js does for their lists?
Richard told me about something like this. Where "old" nodes grow
exponentially in size so the nodes are 1, 2, 4, 8, etc. so it is faster to
hop around and you get better locality. Do you recall what this was called
Richard?
I also think that manually recursing through lists with a case is
relatively common, at least in my experience, so I think it'd be good to
explore concrete bottlenecks in Elm code and see if there are not creative
things we can do to improve the performance profile of List without
changing it completely.
Anyway, exciting to see your work as always! And hopefully these ideas are
helpful!
Evan

Robin Heggelund Hansen

2017-06-12 06:33:57 UTC

Permalink

Excellent points. I'll take a look at the links you posted. Maybe there's
something in there I can't help but pursue :)

Post by Evan
Oh, and a few more notes. I recently did two pull requests on a framework
- Start using lazy
<https://github.com/krausest/js-framework-benchmark/pull/192> - This
made the Elm version significantly faster than before.
- Get rid of List => Array => List conversion
<https://github.com/krausest/js-framework-benchmark/pull/194> - This
actually had *no significant impact* on the results. With random
noise, it actually got slower in the run they chose. Touching the DOM is so
expensive that it tends to overshadow any other concern, and I had the same
experience converting TodoMVC to use a Dict rather than List.
In looking for these, I also found this PR
<https://github.com/krausest/js-framework-benchmark/pull/195> that does
another optimization which should help a bit. The fact that you can now
remove without changing event handlers in all subsequent entries should
provide a decent speed boost, but we'll see when the results are out!
This is all to say: it is important to contextualize these ideas in the
performance characteristics of real programs, and those characteristics can
be quite surprising given how the DOM works.

Post by Evan
Very interesting, thank you for sharing!
I wanted to add a couple notes and ideas that I've had separately.
Syntax
I was formerly somewhat keen on having special syntax for other
collections. For example, OCaml allows you to say [| 1, 2, 3 |] to create
an array. Hassan made a really nice proposal
<https://github.com/elm-lang/elm-plans/issues/12> to have it be #[ 1, 2,
a = Array.fromList
s = Set.fromList
d = Dict.fromList
array =
(a[ 1, 2, 3 ])
set =
(s[ 1, 2, 3 ])
dict =
(d[ 1 => "Tom", 2 => "Sue", 3 => "Jane" ])
(=>) = (,)
This is 1 or 2 characters off all the proposals out there, and it is
visually much nicer in my opinion.
With that knowledge, the case for special syntax seems significantly
weaker. I can also imagine detecting when a list is converted to something
else and doing something more clever in the compiler. That path seems
better overall to me.
List performance of map, foldl, and foldr
I shared this blog post
<https://blogs.janestreet.com/optimizing-list-map/> about optimizing
List.map in OCaml a while ago, and Fred did some excellent work on this
<https://github.com/elm-lang/core/pull/707>! I have not had a chance to
fully assess the impact on generated code size, so I have not merged it in
yet. That trick can likely be very helpful for foldl and foldr, but no one
has looked into it.
I suspect there are many creative things we can do if some focused energy
was applied to lists. I have not taken a comprehensive look at this after
Elm got TCO, and I suspect we can do better!
Alternate Implementations of List
Another thing to consider before switching to some other data structure
is that we can change the performance profile of List using various
- Skip Lists <https://en.wikipedia.org/wiki/Skip_list>
- Finger Trees <https://en.wikipedia.org/wiki/Finger_tree> that maybe
do some sort of "compression" on older leaves
- I believe there's one that immutable.js does for their lists?
Richard told me about something like this. Where "old" nodes grow
exponentially in size so the nodes are 1, 2, 4, 8, etc. so it is faster to
hop around and you get better locality. Do you recall what this was called
Richard?
I also think that manually recursing through lists with a case is
relatively common, at least in my experience, so I think it'd be good to
explore concrete bottlenecks in Elm code and see if there are not creative
things we can do to improve the performance profile of List without
changing it completely.
Anyway, exciting to see your work as always! And hopefully these ideas
are helpful!
Evan

Post by Robin Heggelund Hansen
I've been doing some thinking on how to improve Arrays in Elm. The
https://docs.google.com/document/d/1Z8IC5qk98ISQLP_xKXNOHScCfzkQ8jCVSiYd1SxObB4/edit?usp=sharing
The document is quite large, but most of it are benchmark results
comparing Arrays and Lists.
There are some questions at the bottom of the document that I would love
to get feedback on.

Peter Damoc

2017-06-12 06:54:44 UTC

Permalink

Post by Evan
Syntax
I was formerly somewhat keen on having special syntax for other
collections. For example, OCaml allows you to say [| 1, 2, 3 |] to create
an array. Hassan made a really nice proposal
<https://github.com/elm-lang/elm-plans/issues/12> to have it be #[ 1, 2,
a = Array.fromList
s = Set.fromList
d = Dict.fromList
array =
(a[ 1, 2, 3 ])
set =
(s[ 1, 2, 3 ])
dict =
(d[ 1 => "Tom", 2 => "Sue", 3 => "Jane" ])
(=>) = (,)
This is 1 or 2 characters off all the proposals out there, and it is
visually much nicer in my opinion.

This might sound silly but wouldn't it be possible to have the same syntax
and let the programmer make the type concrete with usage?

array : Array
array = [1, 2, 3]

list : List
list = [1, 2, 3]

I'm asking because we already have a similar situation with `number` where
one ca declare both Int and Float the same way and the compiler is able to
figure out the actual type.

Maybe something similar can be done for dictionaries/records too:

record : Record
record = { field = "value"}

dict : Dict
dict = { key : "value"}

set : Set
set = {1,2}

The main advantage of a dictionary syntax like above is that it is familiar
for people coming from python/javascript.

--
There is NO FATE, we are the creators.
blog: http://damoc.ro/
--
You received this message because you are subscribed to the Google Groups "Elm Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elm-discuss+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Robert Woodhead

2017-06-12 21:52:25 UTC

Permalink

Maybe I'm missing something, but wouldn't this be a situation where a
little syntactic sugar in the compiler would make the medicine go down?

For example, let's say that := meant "assign with conversion" in Elm.

In the expression:

a := b

the compiler knows the types of a and b, and it knows if there is a
function x of type b -> a, so it could just make the substitution (or throw
an error if a suitable function can't be found).

So if a is a Dict Int String and b is a List (Int String) then the compiler
would find Dict.fromList

Robin Heggelund Hansen

2017-06-12 21:56:25 UTC

Permalink

Folks, please stay on topic. If you want to have a discussion regarding
special syntax for the different collection types, please start a new
thread.

Post by Robert Woodhead
Maybe I'm missing something, but wouldn't this be a situation where a
little syntactic sugar in the compiler would make the medicine go down?
For example, let's say that := meant "assign with conversion" in Elm.
a := b
the compiler knows the types of a and b, and it knows if there is a
function x of type b -> a, so it could just make the substitution (or throw
an error if a suitable function can't be found).
So if a is a Dict Int String and b is a List (Int String) then the
compiler would find Dict.fromList

Matthieu Pizenberg

2017-06-13 04:13:35 UTC

Permalink

Post by Robin Heggelund Hansen
I've been doing some thinking on how to improve Arrays in Elm.

Thanks alot for this work on an alternative array implementation! It's
awesome! It's one of some challenges to use elm for my needs (scientific
programming), with blocking UI on long computation, and handling natively
some binary data types (eg. for image manipulations).

A few words regarding your questions:

1) Can the benchmarks be improved?
Could there be one example at a very different scale? For example, I'm
manipulating scientific data for my work which tend to hold many thousands
values (like images). Would be nice to see how the different
implementations scale.
From the results point of view, it's very important to have an idea of the
standard deviation of statistical results (or box plots, or similar). If a
value oscilates between 10 and 1000000, there is no point giving only it's
mean or median value.

2) Are people getting similar results as I have?
Not tested yet but starred your repo to come back later. I've been pausing
my elm coding lately.

3) Has anyone encountered performance issues with lists, and if so, do you
have an example?
No, not for typical small size lists used here and there.

4) Have you been confused by Lists and Arrays? What could be improved to
reduce that confusion?
No, but I've been needing to do some gymnastic between lists, arrays and
dicts for different features specific to each.