API Reference#

This page provides an auto-generated summary of xpersist’s API. For more details and examples, refer to the relevant chapters in the main part of the documentation.

CacheStore#

xpersist.cache.CacheStore([path, readonly, ...])

Implements caching functionality using fsspec backends (local, s3fs, gcsfs, etc...).

xpersist.cache.Artifact

A pydantic model for representing an artifact in the cache.

class xpersist.cache.CacheStore(path='/tmp', readonly=False, on_duplicate_key='skip', storage_options=None)[source]#

Implements caching functionality using fsspec backends (local, s3fs, gcsfs, etc…).

Some backends may require other dependencies. For example to work with S3 cache store, s3fs is required.

Parameters
  • path (str) – The path to the cache store. This can be a local directory or a cloud storage bucket. By default, the path is set to the temporary local directory.

  • storage_options (dict) – fsspec parameters passed to the backend file-system such as Google Cloud Storage, Amazon Web Service S3.

  • readonly (bool) – if True, the cache store is readonly. If False, the cache store is writable.

  • on_duplicate_key (DuplicateKeyEnum) – The behavior when a key is duplicated in the cache store. Valid options are:

    • ‘skip’ (default): do nothing

    • ‘overwrite’: overwrite the existing artifact

    • ‘raise_error’: raise an error if the key is already in the cache store

delete(key, dry_run=True)[source]#

Deletes the key and corresponding artifact from the cache store.

Parameters
  • key (str) – Key to delete from the cache store.

  • dry_run (bool) – If True, the key is not deleted from the cache store. This is useful for debugging.

get(key, serializer=None, load_kwargs=None)[source]#

Returns the value for the key if the key is in the cache store.

Parameters
  • key (str) – Key to get from the cache store.

  • serializer (str) – The name of the serializer you want to use. The built-in serializers are:

    • ‘auto’ (default): automatically choose the serializer based on the type of the value

    • ‘xarray.netcdf’: requires xarray and netCDF4

    • ‘xarray.zarr’: requires xarray and zarr

    • ‘pandas.csv’ : requires pandas

    • ‘pandas.parquet’: requires pandas and pyarrow or fastparquet

    You can also register your own serializer via the @xpersist.registry.serializers.register decorator.

  • load_kwargs (dict) – Additional keyword arguments to pass to the serializer when loading artifact from the cache store.

Returns

value – the value for the key if the key is in the cache store.

Examples

>>> from xpersist import CacheStore
>>> store = CacheStore("/tmp/my-cache")
>>> store.keys()
['foo']
>>> store.get("foo")
[1, 2, 3]
get_artifact(key)[source]#

Returns the artifact corresponding to the key.

Parameters

key (str) – Key to get from the cache store.

Returns

artifact (Artifact) – The artifact corresponding to the key.

Raises

KeyError – If the key is not in the cache store.

keys()[source]#

Returns a list of keys in the cache store.

put(key, value, serializer='auto', dump_kwargs=None, additional_metadata=None)[source]#

Records and serializes key with its corresponding value in the cache store.

Parameters
  • key (str) – Key to put in the cache store.

  • value (typing.Any) – Value to put in the cache store.

  • serializer (str) – The name of the serializer you want to use. The built-in serializers are:

    • ‘auto’ (default): automatically choose the serializer based on the type of the value

    • ‘xarray.netcdf’: requires xarray and netCDF4

    • ‘xarray.zarr’: requires xarray and zarr

    • ‘pandas.csv’ : requires pandas

    • ‘pandas.parquet’: requires pandas and pyarrow or fastparquet

    You can also register your own serializer via the @xpersist.registry.serializers.register decorator.

  • dump_kwargs (dict) – Additional keyword arguments to pass to the serializer when dumping artifact to the cache store.

  • additional_metadata (dict) – A dict with types that serialize to json. These fields can be used for searching artifacts in the metadata store.

Returns

value (typing.Any) – Reference to the value that was put in the cache store.

Examples

>>> from xpersist import CacheStore
>>> store = CacheStore("/tmp/my-cache")
>>> store.keys()
[]
>>> store.put("foo", [1, 2, 3])
>>> store.keys()
['foo']
pydantic model xpersist.cache.Artifact[source]#

A pydantic model for representing an artifact in the cache.

Show JSON schema
{
   "title": "Artifact",
   "description": "A pydantic model for representing an artifact in the cache.",
   "type": "object",
   "properties": {
      "key": {
         "title": "Key",
         "type": "string"
      },
      "serializer": {
         "title": "Serializer",
         "type": "string"
      },
      "load_kwargs": {
         "title": "Load Kwargs",
         "type": "object"
      },
      "dump_kwargs": {
         "title": "Dump Kwargs",
         "type": "object"
      },
      "additional_metadata": {
         "title": "Additional Metadata",
         "type": "object"
      },
      "created_at": {
         "title": "Created At",
         "type": "string",
         "format": "date-time"
      }
   },
   "required": [
      "key",
      "serializer"
   ]
}

Config
  • validate_assignment: bool = True

Fields
field additional_metadata [Optional]#
field created_at [Optional]#
field dump_kwargs [Optional]#
field key [Required]#
field load_kwargs [Optional]#
field serializer [Required]#

Serializers#

xpersist.serializers.Serializer

Pydantic model for defining a serializer.

xpersist.serializers.pick_serializer()

Returns the id of the appropriate serializer

pydantic model xpersist.serializers.Serializer[source]#

Pydantic model for defining a serializer.

Show JSON schema
{
   "title": "Serializer",
   "description": "Pydantic model for defining a serializer.",
   "type": "object",
   "properties": {
      "name": {
         "title": "Name",
         "type": "string"
      }
   },
   "required": [
      "name"
   ]
}

Fields
field dump [Required]#
field load [Required]#
field name [Required]#
xpersist.serializers.pick_serializer(obj)[source]#
xpersist.serializers.pick_serializer(obj)
xpersist.serializers.pick_serializer(obj)
xpersist.serializers.pick_serializer(obj)
xpersist.serializers.pick_serializer(obj)

Returns the id of the appropriate serializer

Parameters

obj (any Python object)

Returns

id (str) – Id of the serializer

Prefect Caching#

xpersist.prefect.result.XpersistResult(...)

A result class used to store the results of a task in a xpersist cache store.

class xpersist.prefect.result.XpersistResult(cache_store, serializer='auto', serializer_dump_kwargs=None, serializer_load_kwargs=None, kwargs=None)[source]#

A result class used to store the results of a task in a xpersist cache store.

Parameters
  • cache_store (xpersist.cache.CacheStore) – The cache store to use for storing the result.

  • serializer (str) – The serializer to use for storing the result. Valid options are:

    • ‘auto’ (default): automatically chooses the serializer based on the type of the value

    • ‘xarray.netcdf’: requires xarray and netCDF4

    • ‘xarray.zarr’: requires xarray and zarr

    • ‘pandas.csv’ : requires pandas

    • ‘pandas.parquet’: requires pandas and pyarrow or fastparquet

  • serializer_dump_kwargs (dict) – The keyword arguments to pass to the serializer’s dump method.

  • serializer_load_kwargs (dict) – The keyword arguments to pass to the serializer’s load method.

  • kwargs (dict) – Any additional keyword arguments to pass to the Result class.

exists(location, **kwargs)[source]#

Checks whether the target result exists in the cache store.

Does not validate whether the result is valid, only that it is present.

Parameters
  • location (str) – Location of the result in the specific result target. Will check whether the provided location exists

  • kwargs (dict) – string format arguments for location

Returns

_ (bool) – whether or not the target result exists

read(location)[source]#

Reads a result from the cache store and returns the corresponding Result instance.

Parameters

location (str) – the location to read from

Returns

result (Result) – a new result instance with the data represented by the location

write(value_, **kwargs)[source]#

Writes the result to a location in the cache store and returns a new Result object with the result’s location.

Parameters
  • value_ (typing.Any) – the value to write; will then be stored as the value attribute of the returned Result instance

  • kwargs (dict) – if provided, will be used to format the location template to determine the location to write to

Returns

result (Result) – A new Result instance with the location of the written result.

Registry#

xpersist.registry.registry()

xpersist's global registry entrypoint.

class xpersist.registry.registry[source]#

xpersist’s global registry entrypoint.

This is used to register serializers and other components that are used by xpersist.

classmethod create(registry_name, entry_points=False)[source]#

Create a new custom registry.

classmethod get(registry_name, func_name)[source]#

Get a registered function from a given registry.

Parameters
  • registry_name (str) – The name of the registry to get the function from.

  • func_name (str) – The name of the function to get.

Returns

func (typing.Callable) – The function from the registry.

classmethod has(registry_name, func_name)[source]#

Check whether a function is available in a registry.

Parameters
  • registry_name (str) – The name of the registry to check.

  • func_name (str) – The name of the function to check.

Returns

bool – Whether the function is available in the registry.