core - I/O and Models

The core module provides fundamental I/O operations and model abstractions. This is Layer 1, built on top of the lib layer.

Key Features

  • Unified I/O API: Read/write any data source with a consistent URI interface

  • URI handling: Support for file:, python:, http: schemes

  • Model abstractions: Converters, extractors, and data models

I/O API

The io submodule provides the primary interface for reading and writing data.

Unified I/O API

This module provides a unified interface for reading and writing data from any URI scheme. It abstracts away the details of different data sources (files, URLs, Python objects) behind a consistent API.

Supported URI Schemes

  • file: - Local filesystem (e.g., file:./data.csv)

  • python: - Python objects by import path (e.g., python://mymodule.MyClass)

  • http:/https: - Remote URLs

Example

>>> from pyswark.core.io import api as io
>>>
>>> # Read from various sources
>>> config = io.read('file:./config.yaml')
>>>
>>> # Write data
>>> io.write(df, 'file:./output.csv')
>>>
>>> # Control logging verbosity (can be set/unset at runtime)
>>> io.set_verbosity('WARNING')  # Suppress INFO messages
>>> io.set_verbosity('INFO')    # Show I/O operations (default)
>>>
>>> # Temporarily change verbosity for specific operations
>>> with io.verbosity('INFO'):
...     df = io.read('file:./important.csv')  # Shows logging
>>> # Verbosity automatically restored
pyswark.core.io.api.acquire(uri, datahandler=None)

Acquire a file handle or connection for a URI without reading.

Useful for streaming or when you need direct access to the underlying resource.

Parameters:
  • uri (str) – The URI to acquire.

  • datahandler (str, optional) – Override the automatic datahandler selection.

Returns:

A file-like object or connection handle.

Return type:

Any

pyswark.core.io.api.isUri(uri)

Check if a string is a valid pyswark URI.

Parameters:

uri (str) – The string to validate.

Returns:

True if the string is a recognized URI scheme, False otherwise.

Return type:

bool

Example

>>> isUri('file:./data.csv')
True
>>> isUri('/plain/path/data.csv')
False
pyswark.core.io.api.read(uri, datahandler=None, **kw)

Read data from any supported URI.

This is the primary entry point for loading data in pyswark. The function automatically determines the appropriate handler based on the URI scheme and file extension.

Parameters:
  • uri (str) –

    The URI to read from. Supports multiple schemes:

    • file:./path/to/file.csv - Local file

    • python://module.Class - Python object

    • https://example.com/data.json - Remote URL

  • datahandler (str, optional) – Override the automatic datahandler selection.

  • **kw – Additional keyword arguments passed to the underlying reader (e.g., index_col=0 for pandas).

Returns:

The loaded data (type depends on the file format).

Return type:

Any

Example

>>> config = read('file:./config.yaml')
pyswark.core.io.api.write(data, uri, datahandler=None, **kw)

Write data to any supported URI.

Parameters:
  • data (Any) – The data to write.

  • uri (str) – The destination URI. Supports file: scheme for local files.

  • datahandler (str, optional) – Override the automatic datahandler selection.

  • **kw – Additional keyword arguments passed to the underlying writer.

Returns:

Result from the write operation (typically None or the path).

Return type:

Any

Example

>>> write(df, 'file:./output.csv', index=False)
>>> write(config, 'file:./config.yaml')

URI Schemes

pyswark supports multiple URI schemes:

Scheme

Example

Description

file:

file:./data.csv

Local filesystem

python:

python://module.Class

Python objects by import path

http:/https:

https://example.com/data.json

Remote URLs

Usage Examples

Reading Data

from pyswark.core.io import api as io

# Read from local file
config = io.read('file:./config.yaml')

# Read from URL
data = io.read('https://example.com/api/data.json')

# Read with custom options (passed to pandas)
df = io.read('file:./data.csv', index_col=0)

Writing Data

from pyswark.core.io import api as io

# Write DataFrame to CSV
io.write(df, 'file:./output.csv', index=False)

# Write config to YAML
io.write(config, 'file:./config.yaml')

# Write model to JSON with type info
io.write(model, 'file:./model.json')

Validating URIs

from pyswark.core.io import api as io

io.isUri('file:./data.csv')      # True
io.isUri('/plain/path.csv')      # False

Core Models

The models submodule provides base classes for data handling, database operations, and model patterns.

Input/Output Base Classes

class pyswark.core.models.xputs.BaseInputs(*args)

Bases: _BaseXputs

base inputs

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyswark.core.models.xputs.BaseOutputs(*args)

Bases: _BaseXputs

base outputs

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Collection Models

Serializable collections that preserve type information:

class pyswark.core.models.collection.Dict(inputs)

Bases: List

asDict()
extract()
inputs: dict | list
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyswark.core.models.collection.List(inputs)

Bases: Base

asDict()
inputs: list | tuple
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyswark.core.models.collection.Set(inputs)

Bases: List

extract()
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyswark.core.models.collection.Tuple(inputs)

Bases: List

extract()
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Model Patterns

Base classes for different model patterns:

class pyswark.core.models.converter.ConverterModel(inputs=None)

Bases: BaseModel

for when you want to convert to an output that pydantic/typing doesnt natively support, i.e. an np.array

classmethod convert(inputs: BaseInputs) Any
inputs: BaseInputs
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property outputs
classmethod validate(inputs: BaseInputs)
class pyswark.core.models.function.FunctionModel(inputs=None, *, outputs: BaseOutputs = None)

Bases: BaseModel

static function(inputs)
inputs: BaseInputs
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

outputs: BaseOutputs
classmethod validate(inputs: BaseInputs)

Database Models

Base classes for database operations:

class pyswark.core.models.db.Db(*, records: list[pyswark.core.models.record.Record] = <factory>, url: str = '', datahandler: str = 'pjson', engine_url: str = 'sqlite:///:memory:', persist: bool = False)

Bases: MixinDb, MixinPost

AllowedInstances: ClassVar[list[Union[str, type]]] = []
AllowedTypes: ClassVar[list[Union[str, type]]] = []
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyswark.core.models.db.MixinDb(*, records: list[pyswark.core.models.record.Record] = <factory>, url: str = '', datahandler: str = 'pjson', engine_url: str = 'sqlite:///:memory:', persist: bool = False)

Bases: BaseModel, MixinName

Base class for a database.

AllowedInstances: ClassVar[list[Union[str, type]]] = []
AllowedTypes: ClassVar[list[Union[str, type]]] = []
asSQLModel(url=None, **kw)
classmethod connect(url, datahandler='', persist=False)

Connect to a .gluedb (load if exists), use engine_url for SQLModel, and optionally persist to the .gluedb file on context exit.

Parameters:
  • url (str) – URI to a db file (e.g. file:./data.gluedb). If it exists, records are loaded. If None, no load or persist is done.

  • persist (bool, optional) – If True and url is set, write self to url on successful context exit. Default False.

Returns:

An instance usable as a context manager. Use with Db.connect(...) as db:.

Return type:

Db

Example

>>> with Db.connect('file:./catalog.gluedb', persist=True) as db:
...     db.post(ticker, name='AAPL')
...     # On exit: writes to file:./catalog.gluedb
...
>>> with Db.connect('file:./catalog.gluedb', engine_url='sqlite:///:memory:') as db:
...     # Loaded from .gluedb; SQL runs in memory; no persist
...     db.getByName('AAPL')
datahandler: str
deleteByName(name)
engine_url: str
property enum
getByName(name)
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

persist: bool
persistToFile()
post(obj, name=None, **infoKw)
postAll(objs)
put(obj, name=None)
records: list[pyswark.core.models.record.Record]
url: str
class pyswark.core.models.record.Record(*, info: Info, body: Body, id: int | None = None)

Bases: BaseModel

acquire()
asSQLModel()
body: Body
id: int | None
info: Info
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyswark.core.models.info.Info(*, name: str, date_created: Datetime | str | dict | None = '', date_modified: Datetime | str | dict | None = '')

Bases: BaseModel

Base class for record metadata/info.

asSQLModel()

Convert to SQLModel, extracting Python datetime from Datetime instances.

clone(**kwargs)

set multiple attributes and return a new Info object

date_created: Datetime | str | dict | None
date_modified: Datetime | str | dict | None
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str
class pyswark.core.models.body.Body(*, model: str, contents: str)

Bases: BaseModel, TypeCheck

Base class for record body.

Base: ClassVar[str | type] = 'pyswark.lib.pydantic.base.BaseModel'
asSQLModel()
contents: str
extract()
model: str
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Primitive Models

Type-safe wrappers for primitive values:

class pyswark.core.models.primitive.Bool(inputs)

Bases: Base

inputs: bool
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyswark.core.models.primitive.Float(inputs)

Bases: Base

inputs: float
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyswark.core.models.primitive.Int(inputs)

Bases: Base

inputs: int
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyswark.core.models.primitive.String(inputs)

Bases: Base

inputs: str
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].