Defining a data model¶
For a module to do something useful, we need to configure its data model. This expresses, in a somewhat Django-centric way, a mapping onto relational database tables where the data is actually stored.
For this example project, there are two data types used; this is very similar
to the way you might define multiple tables in an RDBMS (and in fact maps to
exactly that “under the hood”). We have stations
and observations
.
The definitions of these kinds of data are contained in the files:
$ZDIR/lib/default/modules/noaa-stations/data/station.yml
$ZDIR/lib/default/modules/noaa-stations/data/observations.yml
This choice follows a natural pattern, but is not required. We could put the
definitions in any files we wanted, as long as they live in the module
directory hierarchy and have the extension .yml
. The structure of these
two files is very similar, although somewhat more is defined within
station.yml
since some mixins and bases (more on that soon) are defined
in station.yml
and hence do not need to be duplicated in
observations.yml
.
Within a data model, we typically define a top-level key data_base
and
another under the key data
. While as this module is organized, each of
station.yml
and observations.yml
have their own top level keys, we could
perfectly well put all of this in the same file if we preferred. For example,
as actually organized, we have:
# in station.yml
data:
station:
# ... more info ...
# in observations.yml
data:
observation:
# ... more info ...
This is a decision of the module developer; a different module might choose instead, for example, to have:
# in data-model.yml (not a file in this module)
data:
station:
# ... more info ...
observation:
# ... more info ...
Defining data_base objects¶
In this module, the “abstract” base object station
is used by concrete data
objects (including one called station
). Let us look at that definition,
here contained in station.yml
(but again, it could live elsewhere if you
prefer):
data_base:
station:
# Every model (usually) based on resource
class: StationBase
base: resource
mixins: [station]
id_fields: [number]
meta:
# Number alone is probably unique, demonstrate compound key
unique_together: [number, name]
# Updates must define station
scope: station
This has several notable elements. The field named number
is specific to
the data we are working with. The NOAA data defines a CSV column called
STATION
which is a special number weather services use for identification,
and also a column called NAME
that is a verbose description of the weather
station. We have used names that are more mnemonic for us in calling them
number
and name
in the module, but we are free to use any names
whatsoever.
We are declaring in the data_base
that the combination of number
and
name
will define a unique identifier, but only number
is used as the ID
for queries. In this particular dataset, probably number
alone will be
unique, and the more verbose description name
might actually change over
multiple years. However, the unique_together
key is given a list containing
both mostly for illustration of the possibility.
Defining data objects¶
With the scaffolding in place, we can define an actual data object. Let us
quickly notice something about the observation
object before presenting the
full station
object:
# Inside observation.yml
data:
observation:
class: Observation
# Observation extends Station base data model
base: station
Because an observation represents a “child table”, it is based on the parent
data_base
object station
, inheriting station
’s attributes. Let us
look at (almost) the entire definition for the station
object:
data:
# Actual data models turned into tables
# Fields 'name', 'id', 'updated', 'created' implicitly
# created by base resource (id/updated/created internal)
station:
class: Station
# Resource is the base model in Zimagi core
base: resource
# Primary key (not necessarily externally facing)
id_fields: [number, name]
# Unique identifier within the scope
key: number
roles:
# Redundant to specify 'admin'
edit: [noaa-admin, admin]
# Editors are automatically viewers
# Public does not require authentication
# (viewer will authenticate if public were not listed)
view: [viewer, public]
fields:
number:
type: "@django.CharField"
options:
"null": false
max_length: 255
# editable is default (not specified)
lat:
# In degrees
type: "@django.FloatField"
options:
"null": true
# 'lon' and 'elevation' defined in same manner as 'lat'
meta:
unique_together: [number, name]
# Display ordered by elevation and number
ordering: [elevation, number]
A number of things are happening in this definition. We create an actual
station
object, with a corresponding RDBMS table. The table will not yet
have a way to be populated with this definition, but this determines its schema
and Zimagi will create the empty table based on this.
We can define a primary key as id_fields
and an access identifier as
key
. These may often be the same, but need not be, as the example
illustrates.
A crucial element is that this is where we can define access permissions to this
data object. These roles
correspond to those we created earlier. The
special roles admin and public are always available, but any other strings
may be used to define various permissions (assuming they are defined as roles).
The role admin will always have all permissions, but we list it here to
illustrate its existence.
The crucial element in defining a data element is the fields it will contain
and use. The key fields
lets us list these, along with data types and
properties. Fields can have whatever names are convenient for us; we will see
later how they are translated from whatever names are used in the underlying
data sources (quite likely, those underlying data sources use a variety of
different names, and Zimagi will present a more unified interface to the data).
Data types are provided using Django data definition types, quoted. For example,
latitude (named lat
by us) is a @django.FloatField
type. Within each
field, we may define a few constrains, such as its NULL-ability and, for a
string, its maximum length.
We may define a few special attributes of the data object. For example, by default, queries of this data will be sorted by elevation then by (station) number. This is again chosen for illustration, not any specific business need within this particular module; in other cases, an order may be relevant. Search fields allows for substring search within Zimagi queries.