API Reference#
This page provides an auto-generated summary of xpersist’s API. For more details and examples, refer to the relevant chapters in the main part of the documentation.
CacheStore#
|
Implements caching functionality using fsspec backends (local, s3fs, gcsfs, etc...). |
A pydantic model for representing an artifact in the cache. |
- class xpersist.cache.CacheStore(path='/tmp', readonly=False, on_duplicate_key='skip', storage_options=None)[source]#
Implements caching functionality using fsspec backends (local, s3fs, gcsfs, etc…).
Some backends may require other dependencies. For example to work with S3 cache store, s3fs is required.
- Parameters
path (str) – The path to the cache store. This can be a local directory or a cloud storage bucket. By default, the path is set to the temporary local directory.
storage_options (dict) – fsspec parameters passed to the backend file-system such as Google Cloud Storage, Amazon Web Service S3.
readonly (bool) – if True, the cache store is readonly. If False, the cache store is writable.
on_duplicate_key (DuplicateKeyEnum) – The behavior when a key is duplicated in the cache store. Valid options are:
‘skip’ (default): do nothing
‘overwrite’: overwrite the existing artifact
‘raise_error’: raise an error if the key is already in the cache store
- delete(key, dry_run=True)[source]#
Deletes the key and corresponding artifact from the cache store.
- Parameters
key (str) – Key to delete from the cache store.
dry_run (bool) – If True, the key is not deleted from the cache store. This is useful for debugging.
- get(key, serializer=None, load_kwargs=None)[source]#
Returns the value for the key if the key is in the cache store.
- Parameters
key (str) – Key to get from the cache store.
serializer (str) – The name of the serializer you want to use. The built-in serializers are:
‘auto’ (default): automatically choose the serializer based on the type of the value
‘xarray.netcdf’: requires xarray and netCDF4
‘xarray.zarr’: requires xarray and zarr
‘pandas.csv’ : requires pandas
‘pandas.parquet’: requires pandas and pyarrow or fastparquet
You can also register your own serializer via the @xpersist.registry.serializers.register decorator.
load_kwargs (dict) – Additional keyword arguments to pass to the serializer when loading artifact from the cache store.
- Returns
value – the value for the key if the key is in the cache store.
Examples
>>> from xpersist import CacheStore >>> store = CacheStore("/tmp/my-cache") >>> store.keys() ['foo'] >>> store.get("foo") [1, 2, 3]
- get_artifact(key)[source]#
Returns the artifact corresponding to the key.
- Parameters
key (str) – Key to get from the cache store.
- Returns
artifact (Artifact) – The artifact corresponding to the key.
- Raises
KeyError – If the key is not in the cache store.
- put(key, value, serializer='auto', dump_kwargs=None, additional_metadata=None)[source]#
Records and serializes key with its corresponding value in the cache store.
- Parameters
key (str) – Key to put in the cache store.
value (typing.Any) – Value to put in the cache store.
serializer (str) – The name of the serializer you want to use. The built-in serializers are:
‘auto’ (default): automatically choose the serializer based on the type of the value
‘xarray.netcdf’: requires xarray and netCDF4
‘xarray.zarr’: requires xarray and zarr
‘pandas.csv’ : requires pandas
‘pandas.parquet’: requires pandas and pyarrow or fastparquet
You can also register your own serializer via the @xpersist.registry.serializers.register decorator.
dump_kwargs (dict) – Additional keyword arguments to pass to the serializer when dumping artifact to the cache store.
additional_metadata (dict) – A dict with types that serialize to json. These fields can be used for searching artifacts in the metadata store.
- Returns
value (typing.Any) – Reference to the value that was put in the cache store.
Examples
>>> from xpersist import CacheStore >>> store = CacheStore("/tmp/my-cache") >>> store.keys() [] >>> store.put("foo", [1, 2, 3]) >>> store.keys() ['foo']
- pydantic model xpersist.cache.Artifact[source]#
A pydantic model for representing an artifact in the cache.
Show JSON schema
{ "title": "Artifact", "description": "A pydantic model for representing an artifact in the cache.", "type": "object", "properties": { "key": { "title": "Key", "type": "string" }, "serializer": { "title": "Serializer", "type": "string" }, "load_kwargs": { "title": "Load Kwargs", "type": "object" }, "dump_kwargs": { "title": "Dump Kwargs", "type": "object" }, "additional_metadata": { "title": "Additional Metadata", "type": "object" }, "created_at": { "title": "Created At", "type": "string", "format": "date-time" } }, "required": [ "key", "serializer" ] }
- Config
validate_assignment: bool = True
- Fields
- field additional_metadata [Optional]#
- field created_at [Optional]#
- field dump_kwargs [Optional]#
- field key [Required]#
- field load_kwargs [Optional]#
- field serializer [Required]#
Serializers#
Pydantic model for defining a serializer. |
|
Returns the id of the appropriate serializer |
- pydantic model xpersist.serializers.Serializer[source]#
Pydantic model for defining a serializer.
Show JSON schema
{ "title": "Serializer", "description": "Pydantic model for defining a serializer.", "type": "object", "properties": { "name": { "title": "Name", "type": "string" } }, "required": [ "name" ] }
- field dump [Required]#
- field load [Required]#
- field name [Required]#
- xpersist.serializers.pick_serializer(obj)[source]#
- xpersist.serializers.pick_serializer(obj)
- xpersist.serializers.pick_serializer(obj)
- xpersist.serializers.pick_serializer(obj)
- xpersist.serializers.pick_serializer(obj)
Returns the id of the appropriate serializer
- Parameters
obj (any Python object)
- Returns
id (str) – Id of the serializer
Prefect Caching#
A result class used to store the results of a task in a xpersist cache store. |
- class xpersist.prefect.result.XpersistResult(cache_store, serializer='auto', serializer_dump_kwargs=None, serializer_load_kwargs=None, kwargs=None)[source]#
A result class used to store the results of a task in a xpersist cache store.
- Parameters
cache_store (
xpersist.cache.CacheStore
) – The cache store to use for storing the result.serializer (str) – The serializer to use for storing the result. Valid options are:
‘auto’ (default): automatically chooses the serializer based on the type of the value
‘xarray.netcdf’: requires xarray and netCDF4
‘xarray.zarr’: requires xarray and zarr
‘pandas.csv’ : requires pandas
‘pandas.parquet’: requires pandas and pyarrow or fastparquet
serializer_dump_kwargs (dict) – The keyword arguments to pass to the serializer’s dump method.
serializer_load_kwargs (dict) – The keyword arguments to pass to the serializer’s load method.
kwargs (dict) – Any additional keyword arguments to pass to the Result class.
- exists(location, **kwargs)[source]#
Checks whether the target result exists in the cache store.
Does not validate whether the result is valid, only that it is present.
- Parameters
location (str) – Location of the result in the specific result target. Will check whether the provided location exists
kwargs (dict) – string format arguments for location
- Returns
_ (bool) – whether or not the target result exists
- read(location)[source]#
Reads a result from the cache store and returns the corresponding Result instance.
- Parameters
location (str) – the location to read from
- Returns
result (Result) – a new result instance with the data represented by the location
- write(value_, **kwargs)[source]#
Writes the result to a location in the cache store and returns a new Result object with the result’s location.
- Parameters
value_ (typing.Any) – the value to write; will then be stored as the value attribute of the returned Result instance
kwargs (dict) – if provided, will be used to format the location template to determine the location to write to
- Returns
result (Result) – A new Result instance with the location of the written result.
Registry#
xpersist's global registry entrypoint. |
- class xpersist.registry.registry[source]#
xpersist’s global registry entrypoint.
This is used to register serializers and other components that are used by xpersist.