Hardware entropy is a coupled system
We embedded 58 hardware entropy sources from one Apple Silicon machine into the same vector space. A third of them produce nearly identical embeddings, despite coming from completely different hardware subsystems. They share structure because they share host state.
See the embedding space
Each dot is one 256-byte entropy window. Sources that produce similar bytes land near each other. Start with the 9-source slice, then switch to the full 58-source union.
How we ran this
All sessions were recorded with OpenEntropy, an open-source framework we built for capturing raw bytes from hardware entropy sources. The machine is a Mac Mini M4 (base model, 16 GB RAM). OpenEntropy exposes 58 sources on it, covering clock jitter, memory timing, interrupt scheduling, PLL phase, USB transport, and more. The full source catalog describes each one. One external source is included: a Crypta Labs QCicada USB quantum RNG. No conditioning was applied at the OpenEntropy layer (no Von Neumann debiasing, no SHA-256 hashing).
QCicada was recorded in its "raw noise" mode. In this mode, the device runs its built-in health tests (what the SDK exposes as repetition count and adaptive proportion flags) but does not apply cryptographic conditioning. A fully unprocessed "raw samples" mode also exists, which outputs directly from the quantum optical module with no filtering.
The embedding pipeline (openentropy-embed) cuts each recording into 256-byte windows with a 128-byte stride (~6,800 windows across the full 58-source union). Each window is serialized as spaced hex and embedded with OpenAI's text-embedding-3-large (3,072 dimensions). We ran 4 deliberate stress-test campaigns across 87 sessions. Retrieval evaluation uses leave-one-session-out splits so test sessions are always unseen.
What we found
1. A third of the sources are nearly indistinguishable
19 out of 58 sources land within cosine distance 0.01 of each other in the embedding space. They come from 10 different categories, spanning PLL oscillators, I/O, network, GPU, scheduling, microarchitecture, IPC, signal, timing, and the external quantum RNG. The human-assigned category labels do not predict which sources cluster together.
At the center of this dense core sits fsync_journal, an I/O source that flushes the entire storage stack on every call. It is the nearest embedding neighbor to four different sources from three categories (thermal, timing, scheduling). That makes physical sense: fsync timing is sensitive to every kind of system load, so it acts as a barometer of overall machine state.
We tested whether the core is held together by one dominant shared factor. PCA on the source centroids shows that PC1 explains 58.7% of the variance. After removing it, the core stays intact. Distances between core members increase by only 1.1x. The PLL sources remain nearest neighbors to each other. fsync_journal remains the hub. When we force the residual into clusters, 13-16 of the 19 sources refuse to split. The core is not an artifact of one shared variable. Either there are multiple layers of shared host state affecting the same sources, or high-entropy hex genuinely looks similar to the embedding model regardless of origin. Both explanations survive this test.
2. CPU pipeline sources stand apart
Not everything collapses into the core. Sources that measure CPU pipeline internals produce genuinely different byte patterns. preemption_boundary (kernel scheduler preemption timing) is the most isolated source in the entire space, sitting 0.41 cosine distance from the average. Other outliers include mach_continuous_timing, icc_atomic_contention (cross-core atomic operations), and sleep_jitter.
On the 38 sources with enough sessions for leave-one-out retrieval, the model identifies the correct source 46% of the time on unseen sessions (chance is 2.6%). The embedding picks up real structure, but the dense core makes clean separation hard for the sources inside it.
The model identifies the correct source 46% of the time on sessions it has never seen. Random guessing would score 1.7%. Evaluated with leave-one-session-out splits.
3. Stress tests confirm the shared structure is real
We ran 4 stress-test campaigns targeting CPU scheduling, memory and network pressure, GPU/CPU ramp, and mixed software load. In every campaign, a small number of latent factors explained the majority of cross-source movement. The scheduler campaign was the cleanest: one factor alone accounted for 80% of the variance.
dram_row_buffer was the strongest responder across campaigns, moving the most under both scheduler and software pressure. The PLL sources (audio_pll_timing, display_pll, pcie_pll) were among the most temporally stable, drifting around 0.003-0.005 cosine distance between sessions. Sources that measure system-level state (usb_enumeration, process_table) drifted the most, up to 0.29 between sessions.
Each campaign deliberately stresses a different subsystem. The top factor explains 50-80% of cross-source movement.
4. An external USB quantum RNG looks like the core, but does not move with it
QCicada uses a Quantum Optics Module (QOM) with a light source and photon detector to measure quantum noise, according to the Crypta Labs spec sheet. The device runs its own health tests and post-processing (what Crypta Labs calls the "QEngine"). We recorded in "raw noise" mode, which applies health-test filtering but no cryptographic conditioning. The quantum source itself has nothing to do with host CPU state. It connects to the Mac over USB serial.
QCicada sits in the dense core of the embedding space, with 16 sources within cosine distance 0.01. Its nearest neighbors are counter_beat, iosurface_crossing, nvme_iokit_sensors, and timer_coalescing. That proximity is likely because QCicada produces high-entropy bytes, and so do many other sources in the core. A text embedding model reading hex-serialized near-random bytes may not distinguish their origins.
Under stress tests, QCicada barely moves. Its max centroid shift across conditions is about 0.01 cosine distance, ranking 12th-14th out of 17 sources. Sources like dram_row_buffer shift 10-14x more. QCicada's movement direction does not correlate with PLL source movement either (direction cosines near zero).
This is what you would expect from an independent quantum source: it produces consistently high-entropy bytes that do not respond to host state changes. The fact that it sits in the dense core tells us something about the embedding model (high-entropy hex looks similar regardless of origin), not about physical coupling between the QRNG and the host. The real coupling story in this data is between host-based sources like dram_row_buffer, page_fault_timing, and the PLL oscillators, which do move together under stress.

Crypta Labs QCicada USB QRNG
Why this matters
Entropy pool design. Linux and macOS mix multiple hardware sources into one pool. If some of those sources co-vary under load (which this work suggests they do, at least before conditioning), the pool gets less independent entropy than a per-stream analysis would predict. Whether production conditioning layers (SHA-256, ChaCha20) remove these correlations is an open question we have not tested.
RNG testing. Standard suites like NIST SP 800-22 and TestU01 test one stream at a time. They do not look for cross-source correlations. Embedding-space analysis catches structure that single-stream tests miss.
System monitoring. The shared factors likely correspond to the same variables that affect system performance (scheduler state, memory contention, bus activity). If confirmed, entropy drift in embedding space could serve as a proxy signal for host-state changes.
What this does and does not show
This is one machine, one platform (Apple Silicon), one set of recording conditions. We have not tested cross-device transfer. We have not proven that the shared factors are causal rather than correlated. And we have not tested whether production conditioning removes these correlations (it may well do so).
What we have shown: hardware entropy on this machine contains both a shared component and source-specific structure. Embeddings expose that coupled structure in a way that per-stream statistical tests do not.