For most in systems engineering, mentioning pipelines in the context of observability brings to mind the transmission of data, such as traces, metrics, and log records, to some remote endpoint in the cloud.
Pipelines are seen as something external in execution and examination. For the most part, a structure and process that is obscure and opaque to the codebase of the instrumentation library and the application process.
Pipelines are a means to an end; the transfer of data to a far-off site owned by a third party offering monitoring analytics following receipt and storage of the transmission.
Pipelines only add value in dropping data when overloaded, though it is debatable whether that is useful.
Pipelines are sockets and remote endpoint addresses where data is sent to be stored. There might be many hidden stages along the sequencing of a pipeline, but ultimately there is a single destination store (sink).
With the traditional pipeline, access, augmentation, and analysis are never local, online, or immediate; instead, all such data processing work is remote, offline, and later. That was until now!
Modern: Local First, Remote Second
Substrates is a modern and forward-looking take on observability, where a pipeline, in the form of Circuits and Conduits, is at the core of the instrumentation toolkit and the mechanisms underlying collection, communication, and coordination within the application runtime space. Pipelines are local first and remote second.
Every observability Instrument Modlet built on top of the Substrates toolkit captures, casts, and consumes data locally in the same standard manner, which no other product or open-source project has aimed for or attempted.
Using Substrates, the fusion of multiple streams of data from multiple sources, an essential process of any monitoring and management solution, can be done in-process and in real-time. Doing so has significant benefits, as much of the data collected and sent with the traditional remote-only pipeline approach can nearly always be discarded.
When sensory fusion is done remotely, more environmental and contextual data must be duplicated across streams and events to tie things up. And with each Instrument and Event type having distinct and disjointed data publishing mechanisms, far more processing is required to reconcile another vital aspect: (clock) time.
With a pipeline embedded within a process under observation, as with the Substrates toolkit, it becomes easy to reduce the volumes of data published remotely down to a few salient signals and have the application respond in real-time to the signals making self-adaptive software a reality, finally. What is emitted remotely then becomes only the actions taken and the local data’s synopsis (situation, signals, states) that drove decision-making.