2026-05-25

Daily Digest

World News

The common thread today is that geopolitical risk is becoming more structural and less episodic: wars are hardening into long-duration constraints, diplomatic openings are fragile and domestically contested, and even trade corridors and navigation systems are now active instruments of state competition. For Europe in particular, that means treating defence, energy, logistics resilience and policy credibility as persistent macro variables rather than temporary shocks — while globally, thinner fiscal and political buffers make any future market dislocation more likely to propagate than be cleanly contained.

Angela Merkel won’t be negotiating with Putin – but the rumour reflects a truth about the Ukraine war

Nathalie Tocci · guardian

Talk of appointing ex-leaders to negotiate with Putin is largely symbolic—there’s no imminent ceasefire—but it signals a deeper reality: Ukraine has become more resilient and self-sufficient (≈60% of its military capabilities produced domestically) while Russia has morphed into a war-centred economy and the conflict is grinding into a prolonged attritional phase. Expect sustained European geopolitical risk: higher defence spending, persistent energy-security fragility and longer-tail macro uncertainty that should factor into UK/EU portfolio allocations and assessments of regional supply-chain and policy risks.

First Thing: US and Iran inch closer to peace deal as Trump faces criticism from GOP hawks

Nicola Slawson · guardian

A US–Iran peace agreement now looks closer, which would materially lower Middle East tail-risk and the energy risk premium—reducing volatility for global markets and easing a key macro headwind. Political blowback from GOP hawks and primary defeats signal domestic instability that could shape US fiscal and regulatory trajectories ahead of the midterms, so watch shifts in risk pricing and policy uncertainty rather than treating this as purely diplomatic news.

The world is heading toward a financial crisis – the state of US politics has left us ill-prepared

Eduardo Porter · guardian

US political dysfunction raises the odds that the next financial shock — whether an AI-driven equity repricing, a sell-off in Treasuries, or a geopolitical flare-up — will be met with incoherent, politicised policy responses that amplify market damage. For portfolios this increases tail risk: expect sharper volatility, materially higher yields at times, and less reliable policy backstops, so favour liquidity, yield diversification (not just US Treasuries), and explicit hedges rather than assuming safe‑asset immunity.

‘A bridge, not an obstacle’: is Armenia a new crossroads between east and west?

Patrick Wintour in Yerevan · guardian

Armenia’s leadership is pushing a strategic pivot—opening borders with Turkey and Azerbaijan and courting the EU—to become a Eurasian “middle corridor” that could reroute overland trade between Europe and western China, reducing reliance on Russia’s northern routes and the Suez. If elections endorse Pashinyan, this could reshape regional supply chains, logistics hubs and geopolitical alignments (relevant for macro risk and portfolios), but fragile peace, displaced populations and strong pro‑Russian forces make the payoff high‑reward and high‑risk.

RAF jet carrying defence secretary has signal jammed near Russian border

bbc_world

An RAF jet carrying the UK defence secretary had its GPS disabled near the Russian border, forcing pilots to fall back to alternate navigation systems. This is a reminder that GNSS is a fragile single point of failure—Russian electronic-warfare/ jamming is being used as geopolitical signaling and operational disruption, so for geospatial/ML systems you should treat GNSS outages as an expected failure mode, invest in multi‑sensor fusion, redundant positioning, and adversarial testing (spoofing/jamming scenarios) in both models and production pipelines.

How Saudi Arabia's spending spree reached the end of the line

bbc_world

Saudi Arabia is scaling back its Vision 2030 spending: costly megaprojects and overseas investments are being trimmed as cost overruns and oil‑revenue volatility force a turn toward fiscal discipline. For your portfolio and sector radar, expect less sovereign capital flowing into high‑risk tech, real‑estate and biotech deals, tighter liquidity for mega‑deals, and renewed sensitivity in energy markets and regional geopolitics that could affect macro allocations and startup funding in the near term.

AI & LLMs

Today’s papers all push against the same lazy default in frontier AI: that more scale, more autonomy, or more modality automatically buys better systems. The emerging pattern is more engineering than mystique — performance depends on preserving signal quality through the stack, whether that means SNR-aware scaling and optimization, structured retrieval to constrain reasoning, or modular components whose outputs are validated rather than trusted. That matters because the next tranche of gains looks increasingly architectural and systems-level, not just parametric: reusable agent skills need compatibility and gating, multimodal models need proof that vision is actually carrying semantic load, and diffusion models are finding real efficiency in routing, decoding, and data design rather than brute force. In practice, the field is converging on a more sober recipe for capability: isolate failure modes, externalize and optimize the right artifacts, and spend compute where information is preserved rather than merely amplified.

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Xu Ouyang, Deyi Liu, Yuhang Cai, Jing Liu · hf_daily_papers

Introduce a practical capacity lens: treat LLM training as information transfer over a noisy channel where parameters ≈ bandwidth and tokens ≈ signal power. When signal-to-noise ratio (SNR) isn’t preserved, scaling model size or data can amplify noise and produce U-shaped failure modes (catastrophic overtraining, quantization loss) instead of steady gains. The Shannon Scaling Law predicts these non-monotonic regimes and extrapolates better than classical power laws, giving a quantitative way to decide whether to add parameters, data, or improve SNR (e.g., better regularization, precision, or data quality). For someone building or deploying large models in resource- and accuracy-sensitive domains like drug discovery or geospatial stacks, this means: 1) monitor effective SNR as a diagnostic; 2) rethink compute/data allocation rather than blindly scaling; and 3) expect quantization/fine-tuning to interact with capacity in predictable, model-dependent ways you can fit and use for budgeting.

SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research

Shuofei Qiao, Yunxiang Wei, Jiazheng Fan, Bin Wu · hf_daily_papers

SciAtlas stitches 43M papers into a 3B‑triplet, heterogeneous knowledge graph and exposes neuro‑symbolic retrieval (tri‑path recall + graph reranking) to shift discovery from fuzzy semantic matches to topology‑aware, deterministic association finding. For you: it offers a structured substrate to ground agentic literature exploration, reduce LLM hallucinations, and cut inference costs for complex cross‑disciplinary reasoning — directly useful for mapping competitor activity, surfacing mechanistic links across biology/chemistry, and automating literature reviews or trend synthesis in drug discovery. Public interfaces mean you can prototype integrations with foundation models and R&D pipelines quickly, but expect nontrivial engineering around entity curation, update cadence, and domain‑specific filtering before using it in production workflows.

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

Zisu Huang, Jingwen Xu, Yifan Yang, Ziyang Gong · hf_daily_papers

Model-generated procedural “skills” are a useful way to package and reuse agent behavior, but they’re not a plug-and-play win: on average they help, yet they can cause substantial negative transfer when the extractor and consumer don’t align. Crucially, extractor strength, consumer strength, and model scale are not reliable predictors of utility — a model can be great at extracting skills but poor at using them (or vice versa). The authors identify experience composition and specific skill properties that predict usefulness, and demonstrate a practical meta-skill extractor that consistently raises quality and cuts negative-transfer risk across domains. For building agent pipelines (e.g., reusable planners, domain-specific workflows or automated lab protocols), treat skill creation as a paired extractor–consumer design problem, validate cross-consumption empirically, and adopt targeted extraction heuristics rather than assuming larger models or more data will suffice.

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

Yifan Yang, Ziyang Gong, Weiquan Huang, Qihao Yang · hf_daily_papers

SkillOpt treats agent skills as external, editable text and trains them with a controlled optimizer that issues bounded add/delete/replace edits only when a held-out validation score strictly improves. The result is reproducible, iterative improvement of skills (no one-shot hacks or flaky self-revision), large performance lifts (+19–25 points on GPT-5.5 across harnesses), and skill artifacts that transfer across model families and execution environments. Practically, it gives you a way to auto-tune prompt/agent-document skills with zero extra inference cost at deployment, deterministic accept/reject gating to avoid regressions, and a small set of stability mechanisms (textual learning rate, rejected-edit buffer, slow/meta updates). For someone running production agent loops or packaging reusable skill artifacts, SkillOpt promises lower ops cost, safer rollouts, and much better cross-model portability—though it will hinge on reliable rollout scoring and the optimizer model’s quality.

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

Dong Chen, Fangyun Wei, Ziyu Wan, Dongdong Chen · hf_daily_papers

Lens shows you can beat larger text-to-image models by redesigning data and optimization rather than scaling parameters: a 3.8B model trained on 800M GPT‑4.1–generated, dense (≈109‑word) captions plus per‑batch multi‑resolution/aspect sampling and a semantic VAE achieves parity or better than >6B models while using ~19% of Z‑Image’s training compute. They then use RL with taxonomy prompts, a reasoning prompt‑search module, and distillation to reach 1024^2 inference in 3.15s on an H100 (turbo 4‑step = 0.84s). For ML engineering: this is a compact, practical recipe—high‑quality synthetic supervision, batch design that increases visual coverage, better latents, and aggressive distillation—which transfers to resource‑constrained model development (interactive tools, multi‑scale imaging, faster prototyping). Caveat: heavy reliance on GPT‑4.1 captions raises reproducibility and data‑contamination/alignment risks.

Rethinking Cross-Layer Information Routing in Diffusion Transformers

Chao Xu, Maohua Li, Qirui Li, Yixuan Xu · hf_daily_papers

Diffusion-Adaptive Routing (DAR) reframes the Transformer residual path for diffusion models: instead of incremental residual addition, it learns timestep-adaptive, non‑incremental aggregation over past sublayer outputs, addressing forward magnitude inflation, vanishing backward gradients, and block-wise redundancy. Practically, DAR is a drop‑in swap that markedly speeds convergence (ImageNet DiT: matched quality with ~8.8× fewer iterations, +2.11 FID) and preserves high‑frequency detail during fine‑tuning/distillation, while remaining compatible with other enhancements like REPA. For you this means a low-friction architectural knob to both cut training compute and improve sample fidelity in any diffusion backbone—useful for T2I pipelines and directly transferable to molecular/protein diffusion models or domain-specific fine‑tuning where faster convergence and detail retention matter.

PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

Yifan Lu, Qi Wu, Jay Zhangjie Wu, Zian Wang · hf_daily_papers

PiD replaces the usual decoder/upscaler pair by denoising directly in pixel space conditioned on latents via a lightweight, sigma-aware adapter — letting you terminate latent diffusion early and consolidate decoding+upsampling into one module. Distilled to a 4-step DMD2 variant, it decodes 512×512 latents to 2048×2048 in <1s on a 5090 (13 GB) and ~210 ms on a GB200, beating cascaded SR pipelines ~6× while improving fidelity. For system design that matters: it removes a common bottleneck in latent-generation stacks, trading a heavier pixel decoder for far lower latency and memory at high resolution. That opens room to allocate compute back to the latent generator (or more samples), lowers consumer-GPU prototyping costs, and is immediately relevant for high-res outputs in geospatial modeling or visualizations of protein structures where latency and fidelity matter.

ETCHR: Editing To Clarify and Harness Reasoning

Beichen Zhang, Yuhong Liu, Jinsong Li, Yuhang Zang · hf_daily_papers

ETCHR builds a dedicated, question-conditioned image editor that’s decoupled from the LLM and trained in two stages—supervised imitation of human-like edit trajectories, then VLM-derived reward tuning for edit correctness and downstream reasoning. Plug-and-play with closed- and open-source multimodal LLMs, it yields consistent +4–5.5% absolute Pass@1 lifts across diverse visual-reasoning families (fine-grained perception, charts, logic, jigsaws, 3D), showing modular editing can be a cheaper, more robust alternative to monolithic multimodal models. For you: the decoupled editor pattern maps well to production constraints (no LLM retrain), gives a practical recipe to align visual preprocessors to downstream objectives, and could directly improve image-centric drug-discovery tasks (density maps, structure highlighting) while keeping inference stacks modular and auditable.

Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

Chongyu Fan, Gaowen Liu, Mingyi Hong, Ramana Rao Kompella · hf_daily_papers

Muon’s uniform spectral whitening (drive all singular values to 1) helps LLM pretraining but destabilizes fine-tuning in low-rank or low‑SNR regimes (vision‑language‑action, RL with verified rewards) by amplifying noisy tail directions and destroying per‑head specialization. Pion replaces uniform whitening with a two‑stage high‑pass Newton‑Schulz iteration that pins dominant singular values near 1 while suppressing noisy tails toward 0, plus an inexpensive per‑head mode that preserves pretrained head heterogeneity. It’s a drop‑in, similarly cheap optimizer that empirically fixes Muon’s collapse in RL and markedly improves VLA and robot fine‑tuning. Practical takeaway: when fine‑tuning multimodal, low‑rank/low‑SNR, or head‑sensitive models (including domain‑specific or multimodal drug/discovery heads), try Pion instead of Muon/AdamW and tune the filter strength; it’s low friction to test.

The Expense of Seeing: Attaining Trustworthy Multimodal Reasoning Within the Monolithic Paradigm

Karan Goyal · hf_daily_papers

Vision+LLM stacks commonly appear multimodal by leaning on language priors rather than conveying grounded visual information—what matters is not whether a model performs on a dataset but whether the visual channel actually carries semantic payload. The proposed Modality Translation Protocol and three metrics (Toll, Curse, Fallacy) plus a Semantic Sufficiency Criterion give a neutral, information-theoretic way to measure how much ‘seeing’ a model really does. Practical takeaways: don’t trust projector-only fixes or dataset ablations as proof of grounded multimodality; expect the visual bottleneck to become more salient as LLMs scale (the suggested Divergence Law), so audit encoders explicitly, adopt metrics that quantify visual contribution, and prioritize encoder/architectural changes that force semantic sufficiency—especially important for high-stakes multimodal applications like drug-discovery imaging or geospatial inference.

Pharma & Drug Discovery

This item is a useful reminder that a large share of therapeutic value is created after the drug or intervention is chosen: outcomes in chronic oncology sequelae often hinge on longitudinal behavior change, timing, and trust rather than on protocol alone. For AI-enabled pharma, that matters because the frontier is not just better target selection or molecule design, but integrating patient support, real-world signal capture, and clinically meaningful behavioral endpoints into the product and evidence stack.

A qualitative study of supported self-care in women with lymphoedema associated with breast cancer

Anne Williams · openalex

Lymphoedema following breast cancer benefits less from one-off clinical instruction and more from an anticipatory, relationship-aware model of supported self-care: patients need timely, individualized touchpoints that scaffold reflexivity and gradual self-management rather than only biomedical guidance. Practitioners operating in acute, protocol-driven settings create access and trust gaps; local, tailored information and prompts at predictable trajectory milestones materially affect adaptation and distress. For product and research teams this flags concrete opportunities: build longitudinal, context-aware patient experiences (scheduled check-ins, reflective prompts, just-in-time education) and instruments that capture psychosocial adaptation as a clinical outcome. Those features can improve adherence, reduce downstream costs, and generate richer real-world signals for ML models or digital therapeutics — useful for clinical partnerships or health-tech spinouts targeting chronic cancer sequelae.

Finance & FIRE

The backdrop for FIRE investors is shifting from “TINA” to actual portfolio choice: with long-end yields back to levels that can matter, the opportunity cost of being all-in on expensive equities is no longer trivial. The practical implication isn’t to chase themes or make macro hero calls, but to revisit whether your allocation, duration exposure, and withdrawal assumptions still make sense in a regime where real bonds once again offer a credible alternative to valuation-dependent equity returns.

Top clicks this week on Abnormal Returns

abnormal_returns

Top-clicks coalesce around one macro signal: complacent equity markets face a shifting backdrop — G7 yields at two-decade highs, a normalizing Treasury curve, and concerns about rising long-term rates after several strong equity years. For a FIRE-oriented, index-focused investor this is actionable: higher yields restore fixed income’s role as income and hedge, so consider opportunistic rotation of idle cash into high-quality bonds (but watch duration exposure). Re-evaluate diversification assumptions — assets that protected in the low-rate regime may correlate in a rising-rate one — and stress-test your allocations and withdrawal plans for sequence-of-returns risk. Use tax-efficient wrappers (ISA/SIPP) to rebalance or harvest gains without extra tax drag.

Sunday links: energy levels

abnormal_returns

AI-driven rally has been a major driver of recent market gains, but stretched multiples, optimistic 2028 EPS trajectories, and rising rates mean the next leg of upside is likely to favor earnings durability over narrative. Rising real yields (~3% on long TIPS) are a practical opportunity to lock in real returns for the bond sleeve and to shorten duration risk in portfolios. The ETF scene is fragmenting — lots of new, actively managed, high-fee products — so preserve low-cost, liquid core exposure in your ISA/SIPP and treat niche launches (space ETFs, thematic funds) as concentrated, event-driven plays rather than core holdings. SpaceX’s coming IPO should be read as a leadership/vision bet; the S‑1’s AI-infrastructure mentions are an interesting signal, but IPOs and thematic ETFs often disappoint short-term, so size exposure accordingly.

Startup Ecosystem

The through-line here is that the AI startup stack is maturing from demo-first to constraint-first: memory economics, inference plumbing, credential hygiene, and control-plane safety are becoming more decisive than another layer of product varnish. That raises the bar for what counts as a real company in the sector — not just a model or a slick workflow, but an organisation that can turn autonomous systems into reliable products under hard operational and cost constraints. The flip side is that cheap LLM leverage is making it easier to fake architectural depth and easier to drift into “Sloptember” mode, where velocity masks weak systems and weak measurement. In this environment, the durable edge for early-stage teams is increasingly disciplined engineering: reproducibility, observability, tight feedback loops, and clear ownership of failure modes.

Memory has grown to nearly two-thirds of AI chip component costs

hacker_news

Memory now accounts for roughly two-thirds of component cost in AI accelerators — not compute. That flips the optimization calculus: reducing FLOPS without cutting memory footprint won’t move the needle on TCO. For an ML platform/engineer this means prioritise memory-centric levers (activation checkpointing, activation/weight compression, aggressive quantization, LoRA/low-rank adapters, model sparsity, sharded training that minimises peak activation), re-evaluate hardware choices for HBM capacity and bandwidth, and monitor CXL/disaggregated memory and in‑memory compute startups as structural shifts. For drug-discovery models with large contexts and high-res inputs, memory engineering is now the primary route to cheaper training/inference and competitive speedups — profile memory hot spots first, then apply software + hardware tradeoffs.

The Eternal Sloptember

hacker_news

“Sloptember” names a recurring startup equilibrium: teams sustain life by small, safe releases and continual mini-fundraises, which normalizes sloppy engineering, feature bloat, and incentives to optimize runway over durable product metrics. For evaluating early-stage AI/ML or biotech startups, this is a red flag — it correlates with high technical debt, unreproducible model training, brittle inference pipelines, and weak measurement systems that will block scaling or partnerships. Practical checks: insist on reproducible training artifacts and CI for data/model changes, ask for concrete product usage and retention metrics (not demo cadence), and verify experiment tracking, rollback paths, and inference monitoring. Prioritize teams that trade short-term polish for instrumentation and measurable outcomes when considering hiring, partnering, or investing.

AI agents are quietly generating chaos engineering failures enterprises don’t track yet

venturebeat

Agentic remediation is creating a new, untracked failure class: agents take technically valid actions from a narrow context (restarts, reroutes, scale-ups) without the human judgment call that checks SLO burn, blast radius, or transient dependency load, and those actions can cascade into broad outages. Treat agents as first-class chaos actors: add pre-action gating (SLO/uncertainty checks), require cross-service state awareness or coordination, and include agent behavior in chaos game days and postmortem taxonomies. Short-term mitigations: instrument agents to record intent+context, enforce preconditions/contracts before high-impact actions, simulate agent decisions under realistic system-wide load, and route high-risk actions through human-in-the-loop approval. This is a platform-level problem for anyone running autonomous controllers on shared infrastructure.

Claude is not your architect. Stop letting it pretend

hacker_news

Teams are treating large LMs as automatic system architects — accepting confident, high-level designs without enforcing engineering rigor. That yields brittle designs, hidden technical debt, and security/operational blind spots because LMs hallucinate, ignore nonfunctional constraints, and can’t validate trade-offs or deployment failure modes. For you: don’t let product managers or juniors accept an LLM’s architecture as finished work; use models for ideation and checklist-driven alternatives, then require formal HLD/LLD reviews, benchmarks, failure-mode tests, and small, reproducible prototypes before production. Operationalize guardrails: metricized acceptance criteria, retrieval-grounded context, deterministic design templates, and CI that surfaces mismatches between spec and implementation. This reduces risk, long-term maintenance cost, and misaligned incentives when deciding stack and scale choices.

Greg Brockman interview [video]

hacker_news

Brockman lays out a pragmatic playbook for scaling frontier AI: prioritize product-led API revenue and low-friction developer experience while investing heavily in efficient inference, specialized infra, and alignment research early — not as PR, but as core engineering constraints. He signals that differentiated serving stacks and fine-tuning ecosystems are the durable business moat, and that teams should hire cross-functional generalists who move quickly on iterating products and safety guardrails. For founders, that means verticalized applications and tooling (not another giant base model) win early. For engineering orgs like yours, the clear priority is optimizing the inference/fine-tuning pipeline, observability for model behavior, and embedding alignment practices into the delivery lifecycle.

Most organisations still store their passwords wrong. Here is what actually works.

the_next_web

Password hygiene remains the single biggest operational risk: reused or poorly stored credentials are still the most common breach vector, not exotic zero-days. For engineering teams and early-stage startups that handle sensitive models, datasets, or cloud infra, a password manager plus basic MFA is necessary but insufficient. Prioritize phishing‑resistant authentication (FIDO2/hardware keys or passkeys), SSO with SCIM provisioning and automatic deprovisioning, short-lived IAM credentials for services, automated secret scanning and rotation, and strict least‑privilege roles for model/data stores. Also ensure CI/CD and notebooks don’t bake secrets into images or repos. These measures reduce blast radius from compromised human accounts and are cheaper than cleanup after an IP or dataset leak.