Plums-Microlib Data-Model

The Plums data-model describes a common descriptor format for data used in ML pipelines.

Introduction

It consists in a set of container classes and type classes which are used to describe respectively structures and components.

The latter exposes common interface to ease sharing with external libraries in extension of sharing within the Plums environment.

Note

The data-model implementation is non-defensive in that it does little to no type-check but rather enforces a set of standard interfaces on which microlibs may rely.

Description

The Plums Data-Model

At the top-level sits the DataPoint container class which encapsulate a TileCollection and an Annotation.

A TileCollection is an OrderedDict-like container which store Tile mapped to eventual names.

A valid Tile is expected to be either a Image or a subclass of Tile (such as the convenient TileWrapper for non-pillow images), whereas Annotation is expected to be a GeoInterfaced class.

The Annotation class provided acts as a container class aggregating a MaskCollection container class and a RecordCollection container class.

A Mask is an entity which encodes some non target-related data along with a name and optional properties. The stored data defines either a RasterMask which encodes raster data or a VectorMask which encodes vector data (as a GeoJSON Polygon typically).

A Record is a GeoInterfaced entity which stores a geometry (either a Point or a Polygon) along with some properties, i.e.:

  • A list of Label

  • An optional confidence score

  • Any additional property

Hint

A RecordCollection is a context-aware container to which one may attach a Taxonomy. A Taxonomy is a mean of describing known Label and the way they interact with one another (describing kinship relationships). For more information on taxonomies and the way they are used in Plums, see the Taxonomy API documentation.

Note that all classes with a properties badge are PropertyContainer able to store arbitrary properties with easy retrieval capacities.

Expected interfaces and attributes

The following table lists which interfaces and attributes one should expect from a type or a container class in the data-model. This serves both as a summary of the data-model and an extension guide, indicating what element MUST be implemented by a user custom duck-typing class to interface seamlessly with the rest of the library.

Note

This is a minimum specification of the expected available API. Specific implementation may extend on this but external component should not expect those extensions.

Class

Interfaces

Attributes

DataPoint

\(\varnothing\)

  • tile

  • annotation

  • properties

TileCollection

Tile

  • __array_interface__

  • filename

  • size

  • width

  • height

  • info

Annotation

  • __geo_interface__

  • record_collection

  • mask_collection

  • properties

RecordCollection

  • __geo_interface__

  • __getitem__()

  • __setitem__()

  • __len__()

  • append()

  • to_geojson()

  • id

  • records

Record

  • __geo_interface__

  • to_geojson()

  • id

  • labels

  • confidence

  • coordinates

  • type

  • properties

MaskCollection

  • __getitem__()

  • masks

VectorMask

  • __geo_interface__

  • to_geojson()

  • name

  • coordinates

  • properties

RasterMask

  • __array_interface__

  • name

  • size

  • width

  • height

  • properties