[elm-discuss] Immutable data design problem

Discussion:

Lyle Kopnicky

2017-07-24 02:16:01 UTC

I have a series of datatypes that have already been modeled in a relational
database in a product. I'm trying to construct a lighter-weight in-memory
representation in Elm for purposes of simulating operations and visualizing
the results. Ultimately I will probably want to do some export/import
operations that will allow someone to view data from the real database, or
create records in a test database. But, I don't think the in-memory
representations need to correspond exactly to the database ones in order to
do this. I'd like to focus on as simple of a representation as possible,
and I'm leaving out a fair number of fields.

We start with a provided series of AssessmentEvents. It's just a limited
amount of data for each AssessmentEvent. Some of the fields in the database
can be calculated from the others, so those don't need to be provided. From
this data, we can calculate more information about the AssessmentEvents,
including deltas between them. We can also derive a series of Accounts in a
completely deterministic fashion. Each AssessmentEvent will have up to two
years associated with it, and for each year there will be at least one
Account. From this we can also calculate one or two Bills to go with each
Account.

It's a fairly complex calculation. Certainly I can do it in Elm. But what
I'm waffling about is how to store the data. These calculations can be
cached - they do not need to be repeated if the user just changes their
view of the data. They only need to be revised if the user wants to
insert/edit/update AssessmentEvents. So to do all these calculations every
time the user shifts the view would be wasteful.

It becomes tricky with immutable data. In an object-oriented program, I
would probably just have, say, extra empty fields on the AssessmentEvent
object, that I would fill in as I updated the object. E.g., it could have a
list of accounts, which initially would be a null value until I filled it
in.

At first I thought I might do something similar in the Elm data structure.
An AssessmentEvent can contain a List of Accounts (I'm oversimplifying as
it really needs to list the accounts per year). The list of Accounts can be
initially empty. Then as I calculate the accounts, I can create a new list
of AssessmentEvents that have Accounts in the list. But wait - since the
list of AssessmentEvents is immutable, I can't change it. I can only create
a new one, and then, where in the model do I put it?

When a user initializes the model, then, what should they pass in? Perhaps
they can pass in a list of AssessmentEvents that each have an empty list of
Accounts, and then that gets stored in a variable. Then the Accounts are
calculated, and we generate a new list of AssessmentEvents with Accounts
attached, and that is what gets stored in the model.

But this has some shortcomings. The user must now create something that has
this extra unused field on it (and there will be more). I guess if they are
using a function to create it, they needn't know that there are these extra
fields. But what if the field isn't a list - it's an Int? Then do we need
to make it a Maybe Int? Then all the code that later operates on that Int
will have to handle the case that the Maybe Int might be a Nothing, even
though at that point I know it will always be Just something.

Maybe there should be a data structure that contains an AssessmentEvent,
also containing the extra fields? But what if I have a series of functions,
each of which adds some new field to the AssessmentEvent? Then I need a new
data type for each step that just adds one more field?

Perhaps if I use untagged records, then all the functions can just operate
on the fields they care about, ignoring extra fields. I sort of liked the
extra type safety that came with the tagged record, but it may just get in
the way.

Perhaps instead of attaching this extra data to AssessmentEvents, it could
be kept in separate data structures? But then how do I know how they are
connected? Unless I carefully manage the data in parallel arrays, I will
need to add IDs to the AssessmentEvents, so they can be stored in a Dict.

These are just some of my thoughts. Does anyone have any suggested patterns
to follow?

Thanks,
Lyle

--
You received this message because you are subscribed to the Google Groups "Elm Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elm-discuss+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aaron VonderHaar

2017-07-24 03:16:42 UTC

Permalink

I'm not sure I understand all the details of your domain model, but it
seems like the notable point is that accounts are created implicitly as
assessment events occur, and you'd like to be able to, given an assessment
event, get the related accounts?

I'd probably start with making a module (maybe called "AssessmentStore")
that has functions that describe what you need. I'm thinking something
like:

allEvents : AssessmentStore -> List AssessmentEvent

and hmm... now that I write that out, it seems like that's all you want,
except that you ideally want AssessmentEvent to have a list of Accounts in
it.

I think the approach I would prefer is similar to what you mention in your
last paragraph about keeping the data in separate structures, but you
question the safety of managing parallel structures. If you create a
separate module to encapsulates the data, you can can limit the need for
careful handling to that single module. I might try something like this in
`AssessmentStore`:

type AssessmentStore =
AssessmentStore
{ assessmentEventInfo : Dict EventId { name : String, ... } -- This
is not the full AssessmentEvent; just the things that don't relate to
accounts.
, accountsByEvent : Dict EventId (List AccountId)
, accountInfo : Dict AccountId Account
, allEvents : List EventId -- (or maybe you want them indexed
differently, by time, etc)
}

then have a function to create the assessment store, and then the
`allEvents` functions suggested above (or any other function to get
AssessmentEvents) can take the data in that private data structure and
merge it together to give the data that you actually want to return to the
caller. In fact, you never need to expose the AccountIds/EventIds outside
of this module.

If you are still worried about safety, you can add more unit tests to this
module, or try to define fuzz test properties to help you ensure that you
handle the computations correctly within the module.

I've found this sort of approach to work well because it lets you represent
the data in whatever data structure is most performant and/or appropriate
for your needs (it is often also simpler to implement because the data
structures tend to be much flatter), but also hides the internal
representation behind an module interface so that you can still access the
data in whatever ways are most convenient for the calling code.

Post by Lyle Kopnicky
I have a series of datatypes that have already been modeled in a
relational database in a product. I'm trying to construct a lighter-weight
in-memory representation in Elm for purposes of simulating operations and
visualizing the results. Ultimately I will probably want to do some
export/import operations that will allow someone to view data from the real
database, or create records in a test database. But, I don't think the
in-memory representations need to correspond exactly to the database ones
in order to do this. I'd like to focus on as simple of a representation as
possible, and I'm leaving out a fair number of fields.
We start with a provided series of AssessmentEvents. It's just a limited
amount of data for each AssessmentEvent. Some of the fields in the database
can be calculated from the others, so those don't need to be provided. From
this data, we can calculate more information about the AssessmentEvents,
including deltas between them. We can also derive a series of Accounts in a
completely deterministic fashion. Each AssessmentEvent will have up to two
years associated with it, and for each year there will be at least one
Account. From this we can also calculate one or two Bills to go with each
Account.
It's a fairly complex calculation. Certainly I can do it in Elm. But what
I'm waffling about is how to store the data. These calculations can be
cached - they do not need to be repeated if the user just changes their
view of the data. They only need to be revised if the user wants to
insert/edit/update AssessmentEvents. So to do all these calculations every
time the user shifts the view would be wasteful.
It becomes tricky with immutable data. In an object-oriented program, I
would probably just have, say, extra empty fields on the AssessmentEvent
object, that I would fill in as I updated the object. E.g., it could have a
list of accounts, which initially would be a null value until I filled it
in.
At first I thought I might do something similar in the Elm data structure.
An AssessmentEvent can contain a List of Accounts (I'm oversimplifying as
it really needs to list the accounts per year). The list of Accounts can be
initially empty. Then as I calculate the accounts, I can create a new list
of AssessmentEvents that have Accounts in the list. But wait - since the
list of AssessmentEvents is immutable, I can't change it. I can only create
a new one, and then, where in the model do I put it?
When a user initializes the model, then, what should they pass in? Perhaps
they can pass in a list of AssessmentEvents that each have an empty list of
Accounts, and then that gets stored in a variable. Then the Accounts are
calculated, and we generate a new list of AssessmentEvents with Accounts
attached, and that is what gets stored in the model.
But this has some shortcomings. The user must now create something that
has this extra unused field on it (and there will be more). I guess if they
are using a function to create it, they needn't know that there are these
extra fields. But what if the field isn't a list - it's an Int? Then do we
need to make it a Maybe Int? Then all the code that later operates on that
Int will have to handle the case that the Maybe Int might be a Nothing,
even though at that point I know it will always be Just something.
Maybe there should be a data structure that contains an AssessmentEvent,
also containing the extra fields? But what if I have a series of functions,
each of which adds some new field to the AssessmentEvent? Then I need a new
data type for each step that just adds one more field?
Perhaps if I use untagged records, then all the functions can just operate
on the fields they care about, ignoring extra fields. I sort of liked the
extra type safety that came with the tagged record, but it may just get in
the way.
Perhaps instead of attaching this extra data to AssessmentEvents, it could
be kept in separate data structures? But then how do I know how they are
connected? Unless I carefully manage the data in parallel arrays, I will
need to add IDs to the AssessmentEvents, so they can be stored in a Dict.
These are just some of my thoughts. Does anyone have any suggested
patterns to follow?
Thanks,
Lyle
--
You received this message because you are subscribed to the Google Groups
"Elm Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

h***@gmail.com

2017-07-24 13:50:18 UTC

Permalink

Hi Lyle,

A fairly common pattern is to add a type parameter to the AssessmentEvent,
eg.:

type AssessmentEvent a = AssessmentEvent {
otherField : Int,
accounts : a
}

You would model the case where the accounts are "an empty list" by adding
the type parameter () - the unit type
<https://www.elm-tutorial.org/en/01-foundations/07-unit-type.html>. This is
equivalent to making the value null in an object-oriented setting.

You could then define a function to calculate the accounts:

calculateAccounts : AssessmentEvent () -> AssessmentEvent (List Account)

For convenience, you can also define type aliases:

type alias InitialAssessmentEvent = AssessmentEvent ()
type alias CalculatedAssessmentEvent = AssessmentEvent (List Account)

calculateAccounts : InitialAssessmentEvent -> CalculatedAssessmentEvent

(You can probably come up with better names ;) )
You could then define a "getter" like this, which only operates on
AssessmentEvents that have had accounts calculated:

accounts : CalculatedAssessmentEvent -> List Account
accounts (AssessmentEvent {accounts}) = accounts

Notice that if you do this, it's impossible to accidentally pass in
AssessmentEvents that haven't had their accounts calculated, ie. in the
case you mention ("at that point I know it will always be Just something").

Hope that helps!
-harry

Post by Aaron VonderHaar
I'm not sure I understand all the details of your domain model, but it
seems like the notable point is that accounts are created implicitly as
assessment events occur, and you'd like to be able to, given an assessment
event, get the related accounts?
I'd probably start with making a module (maybe called "AssessmentStore")
that has functions that describe what you need. I'm thinking something
allEvents : AssessmentStore -> List AssessmentEvent
and hmm... now that I write that out, it seems like that's all you want,
except that you ideally want AssessmentEvent to have a list of Accounts in
it.
I think the approach I would prefer is similar to what you mention in your
last paragraph about keeping the data in separate structures, but you
question the safety of managing parallel structures. If you create a
separate module to encapsulates the data, you can can limit the need for
careful handling to that single module. I might try something like this in
type AssessmentStore =
AssessmentStore
{ assessmentEventInfo : Dict EventId { name : String, ... } --
This is not the full AssessmentEvent; just the things that don't relate to
accounts.
, accountsByEvent : Dict EventId (List AccountId)
, accountInfo : Dict AccountId Account
, allEvents : List EventId -- (or maybe you want them indexed
differently, by time, etc)
}
then have a function to create the assessment store, and then the
`allEvents` functions suggested above (or any other function to get
AssessmentEvents) can take the data in that private data structure and
merge it together to give the data that you actually want to return to the
caller. In fact, you never need to expose the AccountIds/EventIds outside
of this module.
If you are still worried about safety, you can add more unit tests to this
module, or try to define fuzz test properties to help you ensure that you
handle the computations correctly within the module.
I've found this sort of approach to work well because it lets you
represent the data in whatever data structure is most performant and/or
appropriate for your needs (it is often also simpler to implement because
the data structures tend to be much flatter), but also hides the internal
representation behind an module interface so that you can still access the
data in whatever ways are most convenient for the calling code.

Post by Lyle Kopnicky
I have a series of datatypes that have already been modeled in a
relational database in a product. I'm trying to construct a lighter-weight
in-memory representation in Elm for purposes of simulating operations and
visualizing the results. Ultimately I will probably want to do some
export/import operations that will allow someone to view data from the real
database, or create records in a test database. But, I don't think the
in-memory representations need to correspond exactly to the database ones
in order to do this. I'd like to focus on as simple of a representation as
possible, and I'm leaving out a fair number of fields.
We start with a provided series of AssessmentEvents. It's just a limited
amount of data for each AssessmentEvent. Some of the fields in the database
can be calculated from the others, so those don't need to be provided. From
this data, we can calculate more information about the AssessmentEvents,
including deltas between them. We can also derive a series of Accounts in a
completely deterministic fashion. Each AssessmentEvent will have up to two
years associated with it, and for each year there will be at least one
Account. From this we can also calculate one or two Bills to go with each
Account.
It's a fairly complex calculation. Certainly I can do it in Elm. But what
I'm waffling about is how to store the data. These calculations can be
cached - they do not need to be repeated if the user just changes their
view of the data. They only need to be revised if the user wants to
insert/edit/update AssessmentEvents. So to do all these calculations every
time the user shifts the view would be wasteful.
It becomes tricky with immutable data. In an object-oriented program, I
would probably just have, say, extra empty fields on the AssessmentEvent
object, that I would fill in as I updated the object. E.g., it could have a
list of accounts, which initially would be a null value until I filled it
in.
At first I thought I might do something similar in the Elm data
structure. An AssessmentEvent can contain a List of Accounts (I'm
oversimplifying as it really needs to list the accounts per year). The list
of Accounts can be initially empty. Then as I calculate the accounts, I can
create a new list of AssessmentEvents that have Accounts in the list. But
wait - since the list of AssessmentEvents is immutable, I can't change it.
I can only create a new one, and then, where in the model do I put it?
When a user initializes the model, then, what should they pass in?
Perhaps they can pass in a list of AssessmentEvents that each have an empty
list of Accounts, and then that gets stored in a variable. Then the
Accounts are calculated, and we generate a new list of AssessmentEvents
with Accounts attached, and that is what gets stored in the model.
But this has some shortcomings. The user must now create something that
has this extra unused field on it (and there will be more). I guess if they
are using a function to create it, they needn't know that there are these
extra fields. But what if the field isn't a list - it's an Int? Then do we
need to make it a Maybe Int? Then all the code that later operates on that
Int will have to handle the case that the Maybe Int might be a Nothing,
even though at that point I know it will always be Just something.
Maybe there should be a data structure that contains an AssessmentEvent,
also containing the extra fields? But what if I have a series of functions,
each of which adds some new field to the AssessmentEvent? Then I need a new
data type for each step that just adds one more field?
Perhaps if I use untagged records, then all the functions can just
operate on the fields they care about, ignoring extra fields. I sort of
liked the extra type safety that came with the tagged record, but it may
just get in the way.
Perhaps instead of attaching this extra data to AssessmentEvents, it
could be kept in separate data structures? But then how do I know how they
are connected? Unless I carefully manage the data in parallel arrays, I
will need to add IDs to the AssessmentEvents, so they can be stored in a
Dict.
These are just some of my thoughts. Does anyone have any suggested
patterns to follow?
Thanks,
Lyle
--
You received this message because you are subscribed to the Google Groups
"Elm Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"Elm Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

Lyle Kopnicky

2017-07-24 23:16:49 UTC

Permalink

Hi Harry,

That's an interesting pattern I hadn't thought of. I had been thinking more
along the lines of:

type alias AssessmentEvent a = { a |
otherField : Int
}

Then something like:

type alias InitialAssessmentEvent = AssessmentEvent {}

type alias ComputedAssessmentEvent = AssessmentEvent { accounts: List
Account }

or even:

type alias WithAccounts a = { a | accounts: List Account }

type alias ComputedAssessmentEvent = AssessmentEvent (WithAccounts {})

...but those use raw records, no tagging. The tags allow for representation
hiding and stronger typing guarantees. (That is, nobody can just make a
record which happens to substitute for an AssessmentEvent just because they
pick the right field names and types.)

Your proposal is an interesting way to do it but retain the type tags. And
it keeps the data within the data structure that the caller would have a
handle on, so they do not need to call some function on a parent object and
an ID in order to indirectly retrieve the data.

That's becoming an important distinction for me. In my view code it's nice
if when it has an AssessmentEvent, it can render that, then extract from
that a list of Accounts, and render those. It's a different interface if it
has to have a Parcel and an EventID and call a function to get a list of
Accounts.

Thanks,
Lyle

Post by h***@gmail.com
Hi Lyle,
A fairly common pattern is to add a type parameter to the AssessmentEvent,
type AssessmentEvent a = AssessmentEvent {
otherField : Int,
accounts : a
}
You would model the case where the accounts are "an empty list" by adding
the type parameter () - the unit type
<https://www.elm-tutorial.org/en/01-foundations/07-unit-type.html>. This
is equivalent to making the value null in an object-oriented setting.
calculateAccounts : AssessmentEvent () -> AssessmentEvent (List Account)
type alias InitialAssessmentEvent = AssessmentEvent ()
type alias CalculatedAssessmentEvent = AssessmentEvent (List Account)
calculateAccounts : InitialAssessmentEvent -> CalculatedAssessmentEvent
(You can probably come up with better names ;) )
You could then define a "getter" like this, which only operates on
accounts : CalculatedAssessmentEvent -> List Account
accounts (AssessmentEvent {accounts}) = accounts
Notice that if you do this, it's impossible to accidentally pass in
AssessmentEvents that haven't had their accounts calculated, ie. in the
case you mention ("at that point I know it will always be Just
something").
Hope that helps!
-harry

Post by Aaron VonderHaar
I'm not sure I understand all the details of your domain model, but it
seems like the notable point is that accounts are created implicitly as
assessment events occur, and you'd like to be able to, given an assessment
event, get the related accounts?
I'd probably start with making a module (maybe called "AssessmentStore")
that has functions that describe what you need. I'm thinking something
allEvents : AssessmentStore -> List AssessmentEvent
and hmm... now that I write that out, it seems like that's all you want,
except that you ideally want AssessmentEvent to have a list of Accounts in
it.
I think the approach I would prefer is similar to what you mention in
your last paragraph about keeping the data in separate structures, but you
question the safety of managing parallel structures. If you create a
separate module to encapsulates the data, you can can limit the need for
careful handling to that single module. I might try something like this in
type AssessmentStore =
AssessmentStore
{ assessmentEventInfo : Dict EventId { name : String, ... } --
This is not the full AssessmentEvent; just the things that don't relate to
accounts.
, accountsByEvent : Dict EventId (List AccountId)
, accountInfo : Dict AccountId Account
, allEvents : List EventId -- (or maybe you want them indexed
differently, by time, etc)
}
then have a function to create the assessment store, and then the
`allEvents` functions suggested above (or any other function to get
AssessmentEvents) can take the data in that private data structure and
merge it together to give the data that you actually want to return to the
caller. In fact, you never need to expose the AccountIds/EventIds outside
of this module.
If you are still worried about safety, you can add more unit tests to
this module, or try to define fuzz test properties to help you ensure that
you handle the computations correctly within the module.
I've found this sort of approach to work well because it lets you
represent the data in whatever data structure is most performant and/or
appropriate for your needs (it is often also simpler to implement because
the data structures tend to be much flatter), but also hides the internal
representation behind an module interface so that you can still access the
data in whatever ways are most convenient for the calling code.

Post by Lyle Kopnicky
I have a series of datatypes that have already been modeled in a
relational database in a product. I'm trying to construct a lighter-weight
in-memory representation in Elm for purposes of simulating operations and
visualizing the results. Ultimately I will probably want to do some
export/import operations that will allow someone to view data from the real
database, or create records in a test database. But, I don't think the
in-memory representations need to correspond exactly to the database ones
in order to do this. I'd like to focus on as simple of a representation as
possible, and I'm leaving out a fair number of fields.
We start with a provided series of AssessmentEvents. It's just a limited
amount of data for each AssessmentEvent. Some of the fields in the database
can be calculated from the others, so those don't need to be provided. From
this data, we can calculate more information about the AssessmentEvents,
including deltas between them. We can also derive a series of Accounts in a
completely deterministic fashion. Each AssessmentEvent will have up to two
years associated with it, and for each year there will be at least one
Account. From this we can also calculate one or two Bills to go with each
Account.
It's a fairly complex calculation. Certainly I can do it in Elm. But
what I'm waffling about is how to store the data. These calculations can be
cached - they do not need to be repeated if the user just changes their
view of the data. They only need to be revised if the user wants to
insert/edit/update AssessmentEvents. So to do all these calculations every
time the user shifts the view would be wasteful.
It becomes tricky with immutable data. In an object-oriented program, I
would probably just have, say, extra empty fields on the AssessmentEvent
object, that I would fill in as I updated the object. E.g., it could have a
list of accounts, which initially would be a null value until I filled it
in.
At first I thought I might do something similar in the Elm data
structure. An AssessmentEvent can contain a List of Accounts (I'm
oversimplifying as it really needs to list the accounts per year). The list
of Accounts can be initially empty. Then as I calculate the accounts, I can
create a new list of AssessmentEvents that have Accounts in the list. But
wait - since the list of AssessmentEvents is immutable, I can't change it.
I can only create a new one, and then, where in the model do I put it?
When a user initializes the model, then, what should they pass in?
Perhaps they can pass in a list of AssessmentEvents that each have an empty
list of Accounts, and then that gets stored in a variable. Then the
Accounts are calculated, and we generate a new list of AssessmentEvents
with Accounts attached, and that is what gets stored in the model.
But this has some shortcomings. The user must now create something that
has this extra unused field on it (and there will be more). I guess if they
are using a function to create it, they needn't know that there are these
extra fields. But what if the field isn't a list - it's an Int? Then do we
need to make it a Maybe Int? Then all the code that later operates on that
Int will have to handle the case that the Maybe Int might be a Nothing,
even though at that point I know it will always be Just something.
Maybe there should be a data structure that contains an AssessmentEvent,
also containing the extra fields? But what if I have a series of functions,
each of which adds some new field to the AssessmentEvent? Then I need a new
data type for each step that just adds one more field?
Perhaps if I use untagged records, then all the functions can just
operate on the fields they care about, ignoring extra fields. I sort of
liked the extra type safety that came with the tagged record, but it may
just get in the way.
Perhaps instead of attaching this extra data to AssessmentEvents, it
could be kept in separate data structures? But then how do I know how they
are connected? Unless I carefully manage the data in parallel arrays, I
will need to add IDs to the AssessmentEvents, so they can be stored in a
Dict.
These are just some of my thoughts. Does anyone have any suggested
patterns to follow?
Thanks,
Lyle
--
You received this message because you are subscribed to the Google
Groups "Elm Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.

Lyle Kopnicky

2017-07-24 23:02:48 UTC

Permalink

Hi Aaron,

Thanks for your thoughtful reply.

The domain model is pretty complex, so it's hard to distill down to a few
issues. There's a higher-level structure called a Parcel. That already
contains, among other things, the list of AssessmentEvents. I have a
function called createParcel that takes a record with a parcel number,
initial owner, and list of AssessmentEvents. Those AssessmentEvents must in
turn have been created by calling createAssessmentEvent, which takes the
independent fields of an AssessmentEvent and creates the full record with
the derived fields. However, there really are yet more fields that can't be
derived by looking at a single AssessmentEvent in isolation. Some
calculation has to be done by determining chains of them and computing
deltas along the chain.

Currently I have createParcel computing a Dict of assessmentEventsById (so,
it's assuming some ID already exists on the AssessmentEvents, which is a
separate issue). It also computes a list of roll years that are relevant to
the assessment events, which involves some date math. It computes an
ownership chain - that is, a list of date ranges and who owned the property
during that time range. And finally it computes the list of assessment
events that are effective for each roll year. Each assessment event might
appear in the list for as many as two consecutive years, depending on its
dates.

Then there will have to be deltas calculated between the assessment events
for a given roll year. The accounts will be created from those. And
finally, one or two bills will be created from each account, depending on
the type of assessment event. All of this will be completely deterministic,
based on the initial seed data of assessment events. But I need these
accounts and bills calculated in order to properly view the data.

If I am using IDs, then I can make a data structure that just contains the
deltas by ID, rather than creating another AssessmentEvent structure that
has room for the delta values. But that would mean that when outside code
needed to get the delta value, it couldn't just have an AssessmentEvent. It
would have to have an AssessmentStore (or Parcel) and an EventID and call a
function which could use that to retrieve the delta value from the Dict.
So, it's a pretty different model for the caller.

So far I have been putting all this logic in one module, called Property.
(The view logic is in a separate module.) I've been using datatypes with a
single constructor, so the view code can pattern match against them. But
now I'm starting to wonder whether it'd be safer to hide the
representations here in the Property module.

At some point in the future I will want to allow adding/removing/updating
assessment events in real time. Then I will have to decide whether I want
to just recalculate the entire set of data or try to figure out which bits
need to change. Recalculating the whole thing will probably be performant
enough. But I guess there could be an issue with IDs - if some data gets
loaded from the database and needs to preserve existing IDs, I can't just
generate new IDs for the whole set. I'll figure out that problem when I
come to it.

Regards,
Lyle

Post by Lyle Kopnicky
I have a series of datatypes that have already been modeled in a
relational database in a product. I'm trying to construct a lighter-weight
in-memory representation in Elm for purposes of simulating operations and
visualizing the results. Ultimately I will probably want to do some
export/import operations that will allow someone to view data from the real
database, or create records in a test database. But, I don't think the
in-memory representations need to correspond exactly to the database ones
in order to do this. I'd like to focus on as simple of a representation as
possible, and I'm leaving out a fair number of fields.
We start with a provided series of AssessmentEvents. It's just a limited
amount of data for each AssessmentEvent. Some of the fields in the database
can be calculated from the others, so those don't need to be provided. From
this data, we can calculate more information about the AssessmentEvents,
including deltas between them. We can also derive a series of Accounts in a
completely deterministic fashion. Each AssessmentEvent will have up to two
years associated with it, and for each year there will be at least one
Account. From this we can also calculate one or two Bills to go with each
Account.
It's a fairly complex calculation. Certainly I can do it in Elm. But what
I'm waffling about is how to store the data. These calculations can be
cached - they do not need to be repeated if the user just changes their
view of the data. They only need to be revised if the user wants to
insert/edit/update AssessmentEvents. So to do all these calculations every
time the user shifts the view would be wasteful.
It becomes tricky with immutable data. In an object-oriented program, I
would probably just have, say, extra empty fields on the AssessmentEvent
object, that I would fill in as I updated the object. E.g., it could have a
list of accounts, which initially would be a null value until I filled it
in.
At first I thought I might do something similar in the Elm data
structure. An AssessmentEvent can contain a List of Accounts (I'm
oversimplifying as it really needs to list the accounts per year). The list
of Accounts can be initially empty. Then as I calculate the accounts, I can
create a new list of AssessmentEvents that have Accounts in the list. But
wait - since the list of AssessmentEvents is immutable, I can't change it.
I can only create a new one, and then, where in the model do I put it?
When a user initializes the model, then, what should they pass in?
Perhaps they can pass in a list of AssessmentEvents that each have an empty
list of Accounts, and then that gets stored in a variable. Then the
Accounts are calculated, and we generate a new list of AssessmentEvents
with Accounts attached, and that is what gets stored in the model.
But this has some shortcomings. The user must now create something that
has this extra unused field on it (and there will be more). I guess if they
are using a function to create it, they needn't know that there are these
extra fields. But what if the field isn't a list - it's an Int? Then do we
need to make it a Maybe Int? Then all the code that later operates on that
Int will have to handle the case that the Maybe Int might be a Nothing,
even though at that point I know it will always be Just something.
Maybe there should be a data structure that contains an AssessmentEvent,
also containing the extra fields? But what if I have a series of functions,
each of which adds some new field to the AssessmentEvent? Then I need a new
data type for each step that just adds one more field?
Perhaps if I use untagged records, then all the functions can just
operate on the fields they care about, ignoring extra fields. I sort of
liked the extra type safety that came with the tagged record, but it may
just get in the way.
Perhaps instead of attaching this extra data to AssessmentEvents, it
could be kept in separate data structures? But then how do I know how they
are connected? Unless I carefully manage the data in parallel arrays, I
will need to add IDs to the AssessmentEvents, so they can be stored in a
Dict.
These are just some of my thoughts. Does anyone have any suggested
patterns to follow?
Thanks,
Lyle
--
You received this message because you are subscribed to the Google Groups
"Elm Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

Aaron VonderHaar

2017-07-25 06:01:50 UTC

Permalink

Ah yes, a nice complicated system :) If you're still looking for more
abstract suggestions, here's how I would approach the problem:

It sounds like you're trying to construct intermediate data structures that
directly map to certain domain concepts, and then you later have to
transform that data into a structure that you ultimately need. I suspect
you may be able to clean it up a bit by focusing on what you need, and not
on what you think you ought to have.

At the boundaries of the system, you have structured input data (perhaps
being decoded from JSON?), possibly structured output data (perhaps being
sent out as JSON?), and the UI view. In my opinion, those are the most
important types in your system. Rather than trying to devise a data
structure that's somewhere in the middle of the input and output, I'd focus
on modeling the input and the output data structures in isolation, and then
try to figure out the shortest/most modular route for transforming the data
from one to the other. If you work in this way, I think you'll tend to end
up with more modular functions (which will also benefit you later, as you
mentioned anticipating having to handle realtime updates to the data).

AssessmentEvents must in turn have been created by calling

createAssessmentEvent, which takes the independent fields of an
AssessmentEvent and creates the full record with the derived fields

This sounds like it might be premature optimization. If you didn't already
try this, I'd suggest passing around the original raw fields instead and
exposing the functions that can compute the derived fields. Furthermore,
try having those functions take only the things that are needed for that
exact calculation, rather than taking the entire AssessmentEvent record as
an input parameter. Doing this will help expose the actual dependencies in
your data and avoid unnecessary coupling with the "AssessmentEvent" concept.

when outside code needed to get the delta value, it couldn't just have an

AssessmentEvent. It would have to have an AssessmentStore (or Parcel) and
an EventID

I would work backwards here and start with what data structure makes sense
in the view, and then write the code to generate that from the raw data,
and then see if there are logical groupings that make sense to refactor out
as data types/modules (as opposed to starting with the domain concepts you
think you are supposed to have and trying to write your view to work with
those).

Does your view just want a list of things to iterate through and display?
If so, it sounds like you want to give it a list of records that have all
the necessary data assembled so you can just iterate through it. Or do you
have some kind of master-detail view where one view is showing the details
of a thing that is chosen in another view? In that case you might want to
have the selector view produce an Id that's stored in the model and used to
later request a specific item to be calculated for the detail view. Or
maybe different parts of the view show different pieces of information,
each of which is hard to compute?

In either case, I'd try writing your view the way you want it, then write
the function to transform the data how you need it, and then decide whether
that function (or parts of it) make the most sense in the view module or in
one of the data structure modules.

Overall, my suspicion is that you might be trying to specific domain
concepts that you are expecting to have but are possibly unnecessary for
what you need to do. So it might be useful to try to cleanly model the
input data and the data you want to display, then write the functions to
map between then, and only then figure out how those functions map to your
business domain as you refactor. Based on what you described, it sounds
like Parcel, AssessmentEvent, and Property are all getting quite
interconnected. If you instead try to focus on transforming from input to
output as directly as possible, I think you'll end up with a system that's
easier to modify or reconfigure later. (The downside is that it may seem
unnatural to people who are used to thinking in terms of the standard
business domain concepts.)

I'll note that the suggestions here match the way I personally like to
approach problems, which is to focus on iteratively discovering the
interfaces. If that style doesn't match the way your team works, you
should disregard :)

Also, would be happy to look at some type annotations if you want to talk
more concretely.

Hi Aaron,
Thanks for your thoughtful reply.
The domain model is pretty complex, so it's hard to distill down to a few
issues. There's a higher-level structure called a Parcel. That already
contains, among other things, the list of AssessmentEvents. I have a
function called createParcel that takes a record with a parcel number,
initial owner, and list of AssessmentEvents. Those AssessmentEvents must in
turn have been created by calling createAssessmentEvent, which takes the
independent fields of an AssessmentEvent and creates the full record with
the derived fields. However, there really are yet more fields that can't be
derived by looking at a single AssessmentEvent in isolation. Some
calculation has to be done by determining chains of them and computing
deltas along the chain.
Currently I have createParcel computing a Dict of assessmentEventsById
(so, it's assuming some ID already exists on the AssessmentEvents, which is
a separate issue). It also computes a list of roll years that are relevant
to the assessment events, which involves some date math. It computes an
ownership chain - that is, a list of date ranges and who owned the property
during that time range. And finally it computes the list of assessment
events that are effective for each roll year. Each assessment event might
appear in the list for as many as two consecutive years, depending on its
dates.
Then there will have to be deltas calculated between the assessment events
for a given roll year. The accounts will be created from those. And
finally, one or two bills will be created from each account, depending on
the type of assessment event. All of this will be completely deterministic,
based on the initial seed data of assessment events. But I need these
accounts and bills calculated in order to properly view the data.
If I am using IDs, then I can make a data structure that just contains the
deltas by ID, rather than creating another AssessmentEvent structure that
has room for the delta values. But that would mean that when outside code
needed to get the delta value, it couldn't just have an AssessmentEvent. It
would have to have an AssessmentStore (or Parcel) and an EventID and call a
function which could use that to retrieve the delta value from the Dict.
So, it's a pretty different model for the caller.
So far I have been putting all this logic in one module, called Property.
(The view logic is in a separate module.) I've been using datatypes with a
single constructor, so the view code can pattern match against them. But
now I'm starting to wonder whether it'd be safer to hide the
representations here in the Property module.
At some point in the future I will want to allow adding/removing/updating
assessment events in real time. Then I will have to decide whether I want
to just recalculate the entire set of data or try to figure out which bits
need to change. Recalculating the whole thing will probably be performant
enough. But I guess there could be an issue with IDs - if some data gets
loaded from the database and needs to preserve existing IDs, I can't just
generate new IDs for the whole set. I'll figure out that problem when I
come to it.
Regards,
Lyle

Post by Aaron VonderHaar
I'm not sure I understand all the details of your domain model, but it
seems like the notable point is that accounts are created implicitly as
assessment events occur, and you'd like to be able to, given an assessment
event, get the related accounts?
I'd probably start with making a module (maybe called "AssessmentStore")
that has functions that describe what you need. I'm thinking something
allEvents : AssessmentStore -> List AssessmentEvent
and hmm... now that I write that out, it seems like that's all you want,
except that you ideally want AssessmentEvent to have a list of Accounts in
it.
I think the approach I would prefer is similar to what you mention in
your last paragraph about keeping the data in separate structures, but you
question the safety of managing parallel structures. If you create a
separate module to encapsulates the data, you can can limit the need for
careful handling to that single module. I might try something like this in
type AssessmentStore =
AssessmentStore
{ assessmentEventInfo : Dict EventId { name : String, ... } --
This is not the full AssessmentEvent; just the things that don't relate to
accounts.
, accountsByEvent : Dict EventId (List AccountId)
, accountInfo : Dict AccountId Account
, allEvents : List EventId -- (or maybe you want them indexed
differently, by time, etc)
}
then have a function to create the assessment store, and then the
`allEvents` functions suggested above (or any other function to get
AssessmentEvents) can take the data in that private data structure and
merge it together to give the data that you actually want to return to the
caller. In fact, you never need to expose the AccountIds/EventIds outside
of this module.
If you are still worried about safety, you can add more unit tests to
this module, or try to define fuzz test properties to help you ensure that
you handle the computations correctly within the module.
I've found this sort of approach to work well because it lets you
represent the data in whatever data structure is most performant and/or
appropriate for your needs (it is often also simpler to implement because
the data structures tend to be much flatter), but also hides the internal
representation behind an module interface so that you can still access the
data in whatever ways are most convenient for the calling code.

Post by Lyle Kopnicky
I have a series of datatypes that have already been modeled in a
relational database in a product. I'm trying to construct a lighter-weight
in-memory representation in Elm for purposes of simulating operations and
visualizing the results. Ultimately I will probably want to do some
export/import operations that will allow someone to view data from the real
database, or create records in a test database. But, I don't think the
in-memory representations need to correspond exactly to the database ones
in order to do this. I'd like to focus on as simple of a representation as
possible, and I'm leaving out a fair number of fields.
We start with a provided series of AssessmentEvents. It's just a limited
amount of data for each AssessmentEvent. Some of the fields in the database
can be calculated from the others, so those don't need to be provided. From
this data, we can calculate more information about the AssessmentEvents,
including deltas between them. We can also derive a series of Accounts in a
completely deterministic fashion. Each AssessmentEvent will have up to two
years associated with it, and for each year there will be at least one
Account. From this we can also calculate one or two Bills to go with each
Account.
It's a fairly complex calculation. Certainly I can do it in Elm. But
what I'm waffling about is how to store the data. These calculations can be
cached - they do not need to be repeated if the user just changes their
view of the data. They only need to be revised if the user wants to
insert/edit/update AssessmentEvents. So to do all these calculations every
time the user shifts the view would be wasteful.
It becomes tricky with immutable data. In an object-oriented program, I
would probably just have, say, extra empty fields on the AssessmentEvent
object, that I would fill in as I updated the object. E.g., it could have a
list of accounts, which initially would be a null value until I filled it
in.
At first I thought I might do something similar in the Elm data
structure. An AssessmentEvent can contain a List of Accounts (I'm
oversimplifying as it really needs to list the accounts per year). The list
of Accounts can be initially empty. Then as I calculate the accounts, I can
create a new list of AssessmentEvents that have Accounts in the list. But
wait - since the list of AssessmentEvents is immutable, I can't change it.
I can only create a new one, and then, where in the model do I put it?
When a user initializes the model, then, what should they pass in?
Perhaps they can pass in a list of AssessmentEvents that each have an empty
list of Accounts, and then that gets stored in a variable. Then the
Accounts are calculated, and we generate a new list of AssessmentEvents
with Accounts attached, and that is what gets stored in the model.
But this has some shortcomings. The user must now create something that
has this extra unused field on it (and there will be more). I guess if they
are using a function to create it, they needn't know that there are these
extra fields. But what if the field isn't a list - it's an Int? Then do we
need to make it a Maybe Int? Then all the code that later operates on that
Int will have to handle the case that the Maybe Int might be a Nothing,
even though at that point I know it will always be Just something.
Maybe there should be a data structure that contains an AssessmentEvent,
also containing the extra fields? But what if I have a series of functions,
each of which adds some new field to the AssessmentEvent? Then I need a new
data type for each step that just adds one more field?
Perhaps if I use untagged records, then all the functions can just
operate on the fields they care about, ignoring extra fields. I sort of
liked the extra type safety that came with the tagged record, but it may
just get in the way.
Perhaps instead of attaching this extra data to AssessmentEvents, it
could be kept in separate data structures? But then how do I know how they
are connected? Unless I carefully manage the data in parallel arrays, I
will need to add IDs to the AssessmentEvents, so they can be stored in a
Dict.
These are just some of my thoughts. Does anyone have any suggested
patterns to follow?
Thanks,
Lyle
--
You received this message because you are subscribed to the Google
Groups "Elm Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups
"Elm Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

Continue reading on narkive:

Search results for '[elm-discuss] Immutable data design problem' (Questions and Answers)

replies

Escapsulation is one of the basic principle of object oriented languages. its conceal its data. what is the me?

started 2012-11-03 23:25:30 UTC

programming & design

replies

Intelligient Design or Evolution?

started 2006-09-12 21:38:41 UTC

religion & spirituality

replies

What's the philosophy behind having to call constructor methods before all else?