ds format

GitHub

Description

Below is a high-level description of the Python interface provided by ds for reading and writing data files. The general structure of the ds format is:

# Dataset definition:
d = {
    # Variable 1 (NumPy array or list):
    "<var1>": [...],
    # Variable 2 (NumPy array or list):
    "<var2>": [...],
    ...,
    # Metadata:
    ".": {
        # Variable 1 metadata:
        "<var1>": { 
            # Dimension names:
            ".dims": ["<dim1>", "<dim2>", ...],
            # Arbitrary attributes:
            "<attr1>": ...,
            "<attr2>": ...,
            ...
        },
        # Variable 2 metadata:
        "<var2>": {
            # Dimension names:
            ".dims": ["<dim1>", "<dim2>", ...],
            # Arbitrary attributes:
            "<attr1>": ...,
            "<attr2>": ...,
            ...
        },
        ...
        # Dataset metadata:
        ".": {
            # Arbitrary attributes
            "<attr1>": ...,
            "<attr2>": ...,
            ...
        }
    }
}

where d['<var<n>>'] are variables containing multi-dimensional NumPy arrays or Python lists, and d['.'] stores the metadata. d['.']['<var<n>>'] contain metadata of each variable – dimension names .dims and any number of arbitrary variable-level attributes. d['.']['.'] contains any number of arbitrary dataset-level attributes. Groups and nesting of variables, as implemented in HDF5, is currently not supported.

Elements

Variables

Variables are multi-dimentional arrays with an arbitrary name, except for names beginning with ., which have a special meaning. The dimensions of variables are specified in the .dims list in the metadata.

Dimensions

Dimensions are names corresponding to dimensions of variables. Dimensions can have the same name as another variable, which may then be interpreted as the axis in certain programs such as Panoply, as is common in NetCDF datasets.

Attributes

Attributes are objects defining variable or dataset metadata, and can be arbitrary key-value pairs.

Example

Using ds interface

This is an example of two variables time and temperature stored in a dataset along with their metadata.

Using the command line interface:

ds write dataset.nc \
    { time time { 1 2 3 } } \
    { temperature time { 16. 18. 21. } units: degree_celsius } \
    title: "Temperature data"

Using the Python interface:

import numpy as np
import ds_format as ds
d = {
    # Variable "time":
    'time': [1, 2, 3],
    # Variable "temperature":
    'temperature': [16., 18., 21.],
    '.': {
        '.': { 'title': 'Temperature data' },
        # Metadata of variable "time":
        'time': {
            # Single dimension named "time":
            '.dims': ['time'],
        },
        # Metadata of variable "temperature":
        'temperature': {
            # Single dimension named "time"
            '.dims': ['time'],
            # Arbitray attributes:
            'units': 'degree_celsius',
        },
    }
}
# Save the dataset as NetCDF:
ds.write('dataset.nc', d)

Using netCDF4 interface

The code produces an equivalent data file using the interface of the Python library netCDF4:

import numpy as np
from netCDF4 import Dataset
d = Dataset('dataset.nc', 'w')
d.title = 'Temperature dataset'
d.createDimension('time', 3)
time = d.createVariable('time', 'i8', ('time',))
temperature = d.createVariable('temperature', 'f8', ('time',))
temperature.units = 'degree_celsius'
time[:] = np.array([1, 2, 3])
temperature[:] = np.array([16., 18., 21.])
d.close()

The result can be viewed by ncdump dataset.nc:

netcdf dataset {
dimensions:
    time = 3 ;
variables:
    int64 time(time) ;
    double temperature(time) ;
        temperature:units = "degree_celsius" ;

// global attributes:
        :title = "Temperature dataset" ;
data:

 time = 1, 2, 3 ;

 temperature = 16, 18, 21 ;
}