The Challenge  

Datacubes provide spatio-temporal flexibility and scalability, Machine Learning (ML) provides particular insights. However, nowadays both form separate silos where experts have to switch back and forth. Much better - read: easier and faster to use - is to offer both in a seamlessly integrated way.

For example, data can be extracted from a datacube, preprocessed as necessary, then the ML model can work on it, and the output can be processed further with all image processing and statistics functionality available. Additionally, there is a full integration into all optimization, distributed processing, etc. the datacube engine provides already.


  Capability Demonstration  

In the AI-Cube project the rasdaman datacube engine has been enhanced so that ML models can be invoked from within datacube queries.

Technically, the OGC-standardized WCPS geo datacube query language is extended via User-Defined Functions (UDFs) to invoke pytorch in the server for model application. Any region, any model can be passed to the server. From a user (i.e., query writer) perspective this external code appears like a regular query function.

The following example illustrates the principle how pretrained ML models, stored in the database, can be invoked (in red) as part of a general analytics query:

for in (Sentinel_2a),
    $m in (CropModel)
return encode( nn.predict( $c[...], $m ), "tiff" )

A particular twist of the TU Berlin contributed RSVQA technique is the integration with natural language processing: A question is submitted along with Sentinel-1 and Sentinel-2 patches and the model, and the output again is natural language. The WCPS query has such a structure:

for $S1 in (S1_GRDH_IW),
    $S2 in (S2_L2A),
    $m in (MyModel)
let $patch := [ {space-time selection of 256x256 patch} ],
        $S1[subs2], $S2[subs2],
        "Are there some airports?"

Next steps include further use case demos, in particular involving fusion, and building libraries of useful, high-accuracy models.



  User Benefit  

  • more powerful: seamless integration of ML into queries, including data fusion
  • all server-side optimizations get automatically applied
  • not tied to any particular type of models, libraries can be built ("Huggingface with datacubes")
  • unlimited use of trained models, but no download: full IP protection