core - I/O and Models
The core module provides fundamental I/O operations and model abstractions.
This is Layer 1, built on top of the lib layer.
Key Features
Unified I/O API: Read/write any data source with a consistent URI interface
URI handling: Support for file:, python:, http: schemes
Model abstractions: Converters, extractors, and data models
I/O API
The io submodule provides the primary interface for reading and writing data.
Unified I/O API
This module provides a unified interface for reading and writing data from any URI scheme. It abstracts away the details of different data sources (files, URLs, Python objects) behind a consistent API.
Supported URI Schemes
file:- Local filesystem (e.g.,file:./data.csv)python:- Python objects by import path (e.g.,python://mymodule.MyClass)http:/https:- Remote URLs
Example
>>> from pyswark.core.io import api as io
>>>
>>> # Read from various sources
>>> config = io.read('file:./config.yaml')
>>>
>>> # Write data
>>> io.write(df, 'file:./output.csv')
>>>
>>> # Control logging verbosity (can be set/unset at runtime)
>>> io.set_verbosity('WARNING') # Suppress INFO messages
>>> io.set_verbosity('INFO') # Show I/O operations (default)
>>>
>>> # Temporarily change verbosity for specific operations
>>> with io.verbosity('INFO'):
... df = io.read('file:./important.csv') # Shows logging
>>> # Verbosity automatically restored
- pyswark.core.io.api.acquire(uri, datahandler=None)
Acquire a file handle or connection for a URI without reading.
Useful for streaming or when you need direct access to the underlying resource.
- Parameters:
uri (str) – The URI to acquire.
datahandler (str, optional) – Override the automatic datahandler selection.
- Returns:
A file-like object or connection handle.
- Return type:
Any
- pyswark.core.io.api.isUri(uri)
Check if a string is a valid pyswark URI.
- Parameters:
uri (str) – The string to validate.
- Returns:
True if the string is a recognized URI scheme, False otherwise.
- Return type:
bool
Example
>>> isUri('file:./data.csv') True >>> isUri('/plain/path/data.csv') False
- pyswark.core.io.api.read(uri, datahandler=None, **kw)
Read data from any supported URI.
This is the primary entry point for loading data in pyswark. The function automatically determines the appropriate handler based on the URI scheme and file extension.
- Parameters:
uri (str) –
The URI to read from. Supports multiple schemes:
file:./path/to/file.csv- Local filepython://module.Class- Python objecthttps://example.com/data.json- Remote URL
datahandler (str, optional) – Override the automatic datahandler selection.
**kw – Additional keyword arguments passed to the underlying reader (e.g.,
index_col=0for pandas).
- Returns:
The loaded data (type depends on the file format).
- Return type:
Any
Example
>>> config = read('file:./config.yaml')
- pyswark.core.io.api.write(data, uri, datahandler=None, **kw)
Write data to any supported URI.
- Parameters:
data (Any) – The data to write.
uri (str) – The destination URI. Supports
file:scheme for local files.datahandler (str, optional) – Override the automatic datahandler selection.
**kw – Additional keyword arguments passed to the underlying writer.
- Returns:
Result from the write operation (typically None or the path).
- Return type:
Any
Example
>>> write(df, 'file:./output.csv', index=False) >>> write(config, 'file:./config.yaml')
URI Schemes
pyswark supports multiple URI schemes:
Scheme |
Example |
Description |
|---|---|---|
|
|
Local filesystem |
|
|
Python objects by import path |
|
|
Remote URLs |
Usage Examples
Reading Data
from pyswark.core.io import api as io
# Read from local file
config = io.read('file:./config.yaml')
# Read from URL
data = io.read('https://example.com/api/data.json')
# Read with custom options (passed to pandas)
df = io.read('file:./data.csv', index_col=0)
Writing Data
from pyswark.core.io import api as io
# Write DataFrame to CSV
io.write(df, 'file:./output.csv', index=False)
# Write config to YAML
io.write(config, 'file:./config.yaml')
# Write model to JSON with type info
io.write(model, 'file:./model.json')
Validating URIs
from pyswark.core.io import api as io
io.isUri('file:./data.csv') # True
io.isUri('/plain/path.csv') # False
Core Models
The models submodule provides base classes for data handling, database operations,
and model patterns.
Input/Output Base Classes
Collection Models
Serializable collections that preserve type information:
- class pyswark.core.models.collection.Dict(inputs)
Bases:
List- asDict()
- extract()
- inputs: dict | list
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pyswark.core.models.collection.List(inputs)
Bases:
Base- asDict()
- inputs: list | tuple
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Model Patterns
Base classes for different model patterns:
- class pyswark.core.models.converter.ConverterModel(inputs=None)
Bases:
BaseModelfor when you want to convert to an output that pydantic/typing doesnt natively support, i.e. an np.array
- classmethod convert(inputs: BaseInputs) Any
- inputs: BaseInputs
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property outputs
- classmethod validate(inputs: BaseInputs)
- class pyswark.core.models.function.FunctionModel(inputs=None, *, outputs: BaseOutputs = None)
Bases:
BaseModel- static function(inputs)
- inputs: BaseInputs
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- outputs: BaseOutputs
- classmethod validate(inputs: BaseInputs)
Database Models
Base classes for database operations:
- class pyswark.core.models.db.Db(*, records: list[pyswark.core.models.record.Record] = <factory>, url: str = '', datahandler: str = 'pjson', engine_url: str = 'sqlite:///:memory:', persist: bool = False)
Bases:
MixinDb,MixinPost- AllowedInstances: ClassVar[list[Union[str, type]]] = []
- AllowedTypes: ClassVar[list[Union[str, type]]] = []
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pyswark.core.models.db.MixinDb(*, records: list[pyswark.core.models.record.Record] = <factory>, url: str = '', datahandler: str = 'pjson', engine_url: str = 'sqlite:///:memory:', persist: bool = False)
Bases:
BaseModel,MixinNameBase class for a database.
- AllowedInstances: ClassVar[list[Union[str, type]]] = []
- AllowedTypes: ClassVar[list[Union[str, type]]] = []
- asSQLModel(url=None, **kw)
- classmethod connect(url, datahandler='', persist=False)
Connect to a .gluedb (load if exists), use engine_url for SQLModel, and optionally persist to the .gluedb file on context exit.
- Parameters:
url (str) – URI to a
dbfile (e.g.file:./data.gluedb). If it exists, records are loaded. If None, no load or persist is done.persist (bool, optional) – If True and url is set, write self to url on successful context exit. Default False.
- Returns:
An instance usable as a context manager. Use
with Db.connect(...) as db:.- Return type:
Example
>>> with Db.connect('file:./catalog.gluedb', persist=True) as db: ... db.post(ticker, name='AAPL') ... # On exit: writes to file:./catalog.gluedb ... >>> with Db.connect('file:./catalog.gluedb', engine_url='sqlite:///:memory:') as db: ... # Loaded from .gluedb; SQL runs in memory; no persist ... db.getByName('AAPL')
- datahandler: str
- deleteByName(name)
- engine_url: str
- property enum
- getByName(name)
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- persist: bool
- persistToFile()
- post(obj, name=None, **infoKw)
- postAll(objs)
- put(obj, name=None)
- records: list[pyswark.core.models.record.Record]
- url: str
- class pyswark.core.models.record.Record(*, info: Info, body: Body, id: int | None = None)
Bases:
BaseModel- acquire()
- asSQLModel()
- id: int | None
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pyswark.core.models.info.Info(*, name: str, date_created: Datetime | str | dict | None = '', date_modified: Datetime | str | dict | None = '')
Bases:
BaseModelBase class for record metadata/info.
- asSQLModel()
Convert to SQLModel, extracting Python datetime from Datetime instances.
- clone(**kwargs)
set multiple attributes and return a new Info object
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- name: str
- class pyswark.core.models.body.Body(*, model: str, contents: str)
Bases:
BaseModel,TypeCheckBase class for record body.
- Base: ClassVar[str | type] = 'pyswark.lib.pydantic.base.BaseModel'
- asSQLModel()
- contents: str
- extract()
- model: str
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Primitive Models
Type-safe wrappers for primitive values:
- class pyswark.core.models.primitive.Bool(inputs)
Bases:
Base- inputs: bool
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pyswark.core.models.primitive.Float(inputs)
Bases:
Base- inputs: float
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].