September 3, 2015
Hacking Common FormOverview of existing Common Form software for programmers
This post is part of a series, Common Form.
A few folks have expressed interest in hacking on Common Form. They’ve also expressed some concern about a perceived lack of comments and documents.
Guilty as charged. Here’s an overview.
There are two things you should know about Node and npm before jumping into Common Form code:
The code takes advantage of Node.js’ recursive dependency resolution. In short, any call to
require('x')to load code from elsewhere kicks off a search for a folder called “x” within a folder called “node_modules” in the same directory as the source code containing the
requirecalls in code you
requirewill start looking for dependencies in their own directories, not at the top level of your project. npm installs packages together with their dependencies in recursive fashion, so that each package gets the versions of its dependencies specified in its
Browserify bundles npm modules together for running in web browsers. It statically analyzes code for
requirecalls, and defines
requirein the browser bundle to return code it has found and packaged. Essentially, Browserify allows Common Form to use npm and Node’s built-in module system to build web browser code.
The packages I’ve written can be broken up into a few categories. Ignoring for the time being the packages used to run a Common Form API server:
Object Schema Validation
Common Form is, at its core, a very simple schema for bits of contract text. Those simple parts can be composed nicely into full-featured documents, but the schema for each individual part is rigorously enforced.
commonform-validate exports validation predicate functions for the core Common Form objects: forms, text, definitions, uses, blanks, references, and children. They enforce the schema for Common Forms.
commonform-predicate exports shorthand functions used to distinguish different kinds of content objects within forms. Those functions don’t perform validation, which can be expensive, so they’re a faster choice when code can assume a form object is valid.
Content-Addressable Hashing of Form Objects
In order to ensure that hashing two equivalent forms produces the same digest, Common Form has to lock down how form objects are serialized and hashed.
commonform-serialize is a custom
JSON.stringify that accepts only arguments made entirely out of
String. All valid Common Forms meet those criteria. The algorithm prints
Object properties sorted by key, so equivalent
Objects are always stringified in the same way. The output is a subset of JSON.
commonform-hash exports the hash function used to create content digests. Currently SHA256. Works on Node.js and in the browser.
commonform-normalize (as in database normalization) converts nested form objects into a Merkle tree, returning a map from content digest to normalized form object and the Merkle root. This is the core function for hashing forms, whether on a server, at the command line, or in the browser.
Rendering Forms Into Other Formats
Renders take a form, a map with fill-in-the-blank values, and optional rendering options. They return a rendering of the form in another format.
commonform-latex outputs LaTeX.
commonform-tex outputs plain TeX.
commonform-docx outputs Microsoft Word compatible Office Open XML (.docx).
commonform-markdown outputs really clunky Markdown.
commonform-markup generates (and parses) Common Form’s own plain text markup format. I’ll rewrite it with a parsing expression grammar someday.
commonform-html generates HTML4 and HTML5.
commonform-terminal outputs plain text with ANSI terminal color codes.
Rendering Helper Functions
As I built the renderers, I occasionally pulled common code out into separate packages.
commonform-resolve adds properties to forms and their content objects to indicate their numbering, whether blanks have been filled in, etc. Used directly or indirectly by nearly all renderers.
commonform-number adds properties to forms indicating their numbering using data that the numbering styles can convert to plain-text numbers.
commonform-flatten transforms a form into an
Array of paragraph-like objects with depth and numbering properties. Used by renderers, like commonform-docx and commonform-terminal, that target formats that model content as a stream or list of elements.
commonform-group-series transforms forms’ content arrays into lists of objects that are either paragraphs containing text, uses, definitions, references, and blanks, and series containing contiguous child forms. Used by commonform-number, commonform-resolve, and commonform-html (which doesn’t number forms).
commonform-analyze recurses a form and produces a useful report about what headings, terms, and blanks are used, defined, and referenced, with relevant numberings showing where these things happen. commonform-lint uses that report to do its structural integrity checks.
Numbering styles are functions that take an abstract numbering argument and a flag for whether to return a full reference (like “Section 1(a)”) or just a short number for a particular form (like “(a)”). They return strings.
outline-numbering returns “Section 1(a)(i)” style strings.
decimal-numbering returns “Section 1.1.1” style strings.
agreement-schedules-exhibits-numbering extends outline-numbering for situations where a form contains an agreement, list of schedules, and exhibits in exactly that order.
resolutions-schedules-exhibits-numbering extends outline-numbering for situations where a form contains formal resolutions, list of schedules, and exhibits in exactly that order.
Annotating Forms with Useful Messages
“Annotators” take a form and return an
Array of annotations:
commonform-critique — Suggests style improvements. Uses wordy-words, a list of wordy English words and phrases and their shorter equivalents.
commonform-archaic — Points out archaisms in form text. Used by commonform-critique. Based on american-legal-archaisms, a list of common archaisms.
commonform-lint — Points out structural errors, like broken cross-references and extra defined terms.
commonform-phrase-annotator — Helper function for building annotators based on string searches. Used by commonform-archaic.
commonform-regexp-annotator — Helper function for building annotators based on regular-expression searches.
commonform-treeify-annotations — Converts an
Arrayof annotations, each of which has a
pathproperty indicating what part of a nested form it applies to, and creates a tree of the same shape as the form with lists of annotations embedded within. Used in the web interface at commonform.org.
commonform-cli installs the
commonform program with commands for hashing, rendering, and annotating forms.
Getting Your Feet Wet
Write an annotator that highlights pet peeve phrases or style choices. I’ll be more than happy to help you add it to the command-line interface.
Tweak a renderer to better suit your style preferences.
Build a program that takes a form as input and outputs a useful summary of reference guide, like a table of contents, table of defined terms, or a graphic showing relationships between provisions.
Have a look at the GitHub issues I’ve tagged “help wanted”. I try and create issues for feature ideas or code improvements as they come to me.
Your thoughts and feedback are always welcome by e-mail.
back to top — edit on GitHub — revision history