Comparison of noaa-stations with module-noaa-stations

This document describes differences between an implementation of similar functionality within a stand-alone command line tool written in Python and a Zimagi module.

Code required

This snapshot is NOAA-stations commits b910e2e and 76a0dae with commit 58d87e6 of module-noaa-stations. Lines of code used and features supported may change in future revisions.

Zimagi module [b] [c]:

module-noaa-stations % cloc .  # 58d87e6
      17 text files.
      17 unique files.
       9 files ignored.

github.com/AlDanial/cloc v 1.86  T=0.02 s (760.2 files/s, 36315.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
YAML                             8             69             66            272
Python                           4             26             11            110 (33)
Markdown                         1             23              0             44
-------------------------------------------------------------------------------
SUM:                            13            118             77            426
-------------------------------------------------------------------------------

Command-line tool [a] [b]:

NOAA-Stations % cloc .  # b910e2e1
       7 text files.
       7 unique files.
       3 files ignored.

github.com/AlDanial/cloc v 1.86  T=0.01 s (789.9 files/s, 30489.7 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                           2             12             34             85
Markdown                         2             21              0             31
YAML                             1              0              0             10
-------------------------------------------------------------------------------
SUM:                             5             33             34            126
-------------------------------------------------------------------------------

Command-line tool (normalized tables) [a] [b]:

NOAA-Stations % cloc .  # 76a0daee
       7 text files.
       7 unique files.
       3 files ignored.

github.com/AlDanial/cloc v 1.86  T=0.01 s (680.8 files/s, 43300.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Markdown                         2             46              0            107
Python                           2             16             44             95
YAML                             1              0              0             10
-------------------------------------------------------------------------------
SUM:                             5             62             44            212
-------------------------------------------------------------------------------
a(1,2)

The small YAML file in NOAA-Stations is a conda environment configuration file. It is only indirectly related to the tool itself, but providing the necessary dependencies is reasonable to consider part of tool requirements. A pip requirements.txt would be similar length.

b(1,2,3)

The Markdown files in both repositories are entirely documentation and are not directly related to the functionality of either. In the main, the documentation is this file itself.

c

Three of the four Python files in module-noaa-stations are auto-generated. The only file written by hand contains 33 code lines (and some comments).

Feature comparisons

Feature description

Zimagi module

Command-line

CL Normalized

Exposes all source data columns

No [f]

Yes

Yes

Download by year range

Partial [d]

Yes

Yes

Download by station list

Partial [d]

Yes

Yes

Download of all stations

No [e]

Yes

Yes

Flexible querying of local DB

Yes

Yes

Yes

RESTful API to access local DB

Yes

No

Yes

Missing data cleaned

Partial

Yes

Yes

Performs good normalization

Yes

No [g]

Yes

Provisions for cloud deployment

Yes

No [h]

No [h]

Supports “pretty” output

Yes

Yes

Yes

Supports CSV export

Yes

Yes

Yes

Supports TSV export

No

Yes

Yes

Supports JSON export

Yes

Yes

Yes

“Code” lines [i]

305

95

105

d(1,2)

Only a test import subcommand defined currently, but data model supports parameters for min year, max year, and station list.

e

Logic for obtaining station list within year currently stubbed out but should follow identical logic to that used in command-line tool.

f

Data definitions could be created for columns not currently utilized. My estimate is that it would require about 150 additional lines of YAML and maybe 20 lines of Python.

g

The initial command-line tool simply used the same table structure as the source CSV files. Adding a child table with foreign key would require about 6 extra lines of Python, and 8 extra lines of SQL (which is currently defined as a Python string rather than separate file).

h(1,2)

No current code knows about any clouds, but the code that would need to be distributed to one is very minimal.

i

YAML or Python code that is functionally required for the system to operate. Documentation in Markdown or other formats is very desirable to have, but does not change functionality. Auto-generated Python code is excluded.