← Nathan Bosch
← latest·

2026-05-22

Daily Digest

AI & LLMs

The common thread today is that the frontier is shifting from “bigger base model” to “better systems around the model”: retrofitted long-context, adaptive KV handling, trajectory compilation, and lightweight orchestrators all point to capability gains coming from memory management, routing, and inference-time structure rather than pretraining alone. That also sharpens the real bottleneck: production robustness and governance, because once models are embedded in stateful, tool-using workflows, benchmark wins matter less than whether they can retrieve the right evidence, stay calibrated across long sessions, and leave an auditable trail under tighter regulatory scrutiny.

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

Yanke Zhou, Yiduo Li, Hanlin Tang, Maohua Li · hf_daily_papers

RTPurbo shows you can retrofit a standard full‑attention LLM into a highly sparse long‑context model with only a few hundred adaptation steps by: keeping full KV caches for a small set of retrieval heads, using a 16‑dim token indexer to locate long‑range tokens, and applying dynamic top‑p token selection per query. The result is near‑lossless accuracy with big efficiency wins (up to 9.36× prefill at 1M context, ≈2× decode speedup). Practically, this lets teams avoid expensive sparse‑native pretraining and instead adapt existing foundation models to support very long contexts with far lower KV memory and compute—directly relevant if you need long histories for protein sequences, assay metadata, or cross‑document synthesis in drug discovery. Key follow‑ups: validate on biologically structured sequences and assess index/head‑selection operational complexity.

KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving

Zedong Liu, Xinyang Ma, Dejun Luo, Hairui Zhao · hf_daily_papers

KVServe presents a practical levers-first approach to the dominant network/storage bottleneck in disaggregated LLM serving: make KV compression adaptive to service context (workload mix, bandwidth, SLO/quality tradeoffs) rather than a static runtime knob. It unifies compression methods into a recomposable strategy space, uses a Bayesian profiler to cheaply find a small Pareto set of candidates, and runs a light-weight online controller (analytical latency model + bandit) to pick profiles and correct offline/online mismatch. Results show large JCT/TTFT gains in PD-separated and KV-disaggregated setups. For production ML infra, this is a concrete pattern to cut end-to-end latency and network costs without harming quality—worth prototyping in vLLM-based stacks or any KV-separated serving, especially for latency-sensitive drug-discovery workflows where TTFT and SLOs matter.

One thing that's been bothering me lately: benchmark performance often tells me almost nothing about whether a workflow will survive production usage.[D]

reddit_ml

Benchmarks that reward single-task accuracy often fail to predict whether a model will survive real production use: ambiguous intent, messy context, contradictory instructions and long-running sessions are common failure modes. Practical mitigation is to evaluate behavioral robustness directly — scenario-based tests that inject ambiguity and context noise, stateful/session stress tests, instruction-contradiction suites, adversarial and OOD inputs, calibration/uncertainty checks, and user-simulator-driven end-to-end runs. Combine that with deployment controls (shadow/canary, continuous validation, drift detection, automated rollback) and metrics beyond accuracy (consistency, recoverability, calibration, latency under load). For you: multi-step drug-discovery pipelines and geospatial workflows are particularly sensitive to accumulated errors and subtle context shifts — invest in stateful simulators, uncertainty propagation, and continuous monitoring to avoid “benchmarks look good but production fails.”

ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning

Juncheng Wu, Letian Zhang, Yuhan Wang, Haoqin Tu · hf_daily_papers

ClinSeekAgent turns clinical reasoning from passive consumption into active, multimodal evidence acquisition: given only a query and raw sources, it autonomously searches knowledge bases, EHRs, and imaging tools, refines hypotheses, and grounds decisions. As an inference-time wrapper it raises LLM EHR and CXR performance noticeably (e.g., +15.1 F1 on multimodal CXR for Claude Opus 4.6) and, critically, can be distilled into a compact 35B model that narrows the gap to much larger hosts. For ML infra and drug-discovery use cases, this demonstrates a reusable pattern: agentic retrieval + iterative planning over heterogeneous raw data, plus distillation to produce deployable, audit-friendly models—useful for mining lab records, assay imagery, and structural data. Main caveats: guarantees depend on retrieval fidelity and traceability; evaluate hallucination and provenance handling before clinical or regulated deployment.

Unsupervised Process Reward Models

Artyom Gadetsky, Maxim Kodryan, Siba Smarak Panigrahi, Hang Guo · hf_daily_papers

uPRM shows you can get step-level reward signals without costly human step annotations by turning LLM next-token probabilities into a joint scoring function that pinpoints first-error positions across reasoning trajectories. It yields up to +15% accuracy over a naive LLM-as-judge for error localization, matches supervised PRMs as a verifier (and beats majority voting by up to 6.9%), and serves as a more stable reward for RL training. For work like drug-discovery pipelines or multi-step synthesis/retrosynthesis reasoning, this removes a major annotation bottleneck: you can scale process-aware verification and RL fine-tuning without expert labels. Caveats: performance depends on next-token calibration and batch construction, so expect domain adaptation and calibration work for protein/chemistry-specific models, but the payoff is much more scalable, fine-grained alignment and verifier tooling.

Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

Jinyang Wu, Guocheng Zhai, Ruihan Jin, Yuhao Shen · hf_daily_papers

A small RL orchestrator (4B) learns to route calls across a registry of frozen expert models and a two-tier skill library, using outcome-based reward rather than step-by-step labels. It outperforms larger closed LLMs on multimodal benchmarks, generalizes to unseen experts without retraining, and keeps latency/compute low. For your work this matters three ways: 1) operationally — you can add domain-specific predictors (e.g., structure scorers, ligand filters, geospatial analyzers) to a registry and cheaply evaluate combinations without retraining a monolith; 2) cost/efficiency — a lightweight policy reduces redundant compute vs. calling one giant model for everything; 3) safety/controllability — freezing experts and learning a routing policy makes auditing and constrained behavior easier. Code is available for prototyping integrations into drug-discovery or mapping pipelines.

ACC: Compiling Agent Trajectories for Long-Context Training

Qisheng Su, Zhen Fang, Shiting Huang, Yu Zeng · hf_daily_papers

ACC turns multi-step agent logs (tool calls, observations, answers) into supervised long-context QA pairs so models learn to integrate distant evidence directly, instead of relying on costly long-document curation or masked tool-supervision. Practically, it produces scalable fine-tuning data that lets much smaller models (30B) match much larger ones (235B) on cross-turn coreference and graph-traversal tasks by inducing task-adaptive attention routing and expert-like specialization. For you: this is an efficient route to build LLMs that reason over extended experimental or pipeline traces (e.g., search histories, simulation outputs, lab notes) without massive annotation—useful for in-silico drug workflows where evidence is scattered across tool calls. Implementation-wise, pipeline the agent telemetry into QA compilation, combine with existing long-context extensions, and watch for alignment/privacy risks when distilling tool outputs into direct-answer supervision.

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Ali Hatamizadeh, Yejin Choi, Jan Kautz · hf_daily_papers

Gated DeltaNet-2 removes the single scalar that tied erasing and writing in linear (recurrent) attention, replacing it with independent channel-wise erase and write gates and folding channel decay into an efficient chunkwise WY update. Practically this preserves constant-memory decoding while improving the fidelity of compressed recurrent state, which translates to noticeably better multi-key retrieval and long-context needle-in-a-haystack tasks without sacrificing parallel training efficiency (gate-aware backward pass). For production ML engineers: it’s a directly usable recipe to get stronger long-range retrieval and lower-memory inference in recurrent/hybrid transformer setups, with an implementation path that won’t blow up training pipelines. Worth prototyping if you care about long-sequence protein/molecule contexts, retrieval-augmented models, or cost-efficient serving of large-context LMs.

Giving Agents Computers — Ivan Burazin, Daytona

latent_space

Agents are moving off ‘localhost’ into persistent remote compute — think long-running, stateful workflows that need orchestration, sandboxes, billing, and auditability. The consolidation around an LLM-OS stack (and product offerings like Perplexity/Manus/Cursor) makes third-party infra (Daytona and peers) a core dependency rather than an optional optimization. For you that means rethinking how agent-driven experiments and pipelines are executed: design for remote execution semantics (checkpointing, deterministic replay, telemetry), strict sandboxing and IP protection for proprietary molecular data, cost-control primitives, and clear audit/logging for reproducibility and validation. Watch Daytona and the LLM-OS ecosystem as both vendor partners and potential single points of operational lock-in you’ll want to benchmark and integrate with carefully.

Wall Street Journal: The American Rebellion Against AI Is Gaining Steam

reddit_singularity

A growing political and public backlash in the U.S. is creating real regulatory and reputational risk for AI projects — expect tighter oversight, procurement scrutiny, and calls for transparency that could slow marquee deployments. For someone building foundation models and applying them to drug discovery, this raises operational priorities: tighten dataset provenance and consent tracking, formalize model lineage and validation evidence, and plan hybrid deployment options (on-prem or vetted cloud instances) to satisfy auditors and partners. There’s also an opportunity: teams that can demonstrate robust interpretability, reproducibility, and safety testing will win partnerships and funding as customers and regulators seek lower-risk suppliers. Short term: audit your data and validation artifacts, and prepare concise safety/compliance messaging for collaborators and investors.

World News

The common thread today is strategic scarcity: wars that look regionally bounded are now reshaping alliance credibility, fiscal priorities, and industrial capacity well beyond their immediate theatres. Europe is being pushed into a more permanent defence-and-energy reallocation just as the US signals that hard-power commitments are no longer infinitely scalable, which raises the probability that geopolitics shows up less as a headline shock and more as a durable input into inflation, public borrowing, and supply-chain risk.

Trump’s ‘disappointment’ with Nato lays groundwork for ‘one of the most important’ summits ever, Rubio says – Europe live

Jakub Krupa in Prague · guardian

Germany is moving to spend north of 4% of GDP on defence this year with a 5% target, while NATO ministers are girding for a high-stakes Ankara leaders’ summit after US expressions of “disappointment” that could drive redeployments. Expect sustained defence procurement and deeper EU defence-industrial cooperation (and higher energy/insurance risk premia if tensions around Iran and the Strait of Hormuz persist) — a material shift for European fiscal priorities, supply chains, and dual-use tech vendors that could open opportunities for defence-capable startups and geospatial/ML contractors.

Big oil’s war profits may have a silver lining after all

Damian Carrington · guardian

Oil majors have pulled in extraordinary windfall profits since the Iran conflict — tens of millions of dollars an hour for top players — while households are directly paying higher pump and energy bills. For portfolio and policy implications: expect short-term tailwinds for energy equities alongside heightened political pressure for EU/UK windfall taxes, upward inflationary pressure from energy costs, and renewed momentum for climate and transition policies that increase regulatory risk for fossil-fuel exposures and opportunity for renewables/transition plays.

Burnham to launch byelection campaign as Green candidate quits after just nine hours – UK politics live

Taz Ali · guardian

Andy Burnham’s return bid and the Greens’ swift candidate withdrawal signal growing instability in the UK political landscape ahead of the Makerfield byelection — if Burnham regains a seat he could credibly challenge Labour leadership discussions and shift party priorities. Meanwhile, April borrowing was materially higher with debt interest at a record April level, driven by inflation and geopolitical risk; that raises gilt yields and fiscal pressure, a meaningful signal for UK-focused portfolio allocations, pension/SIPP assumptions and interest-rate-sensitive positions.

US arms sales to Taiwan on ‘pause’ due to Iran war, says acting navy chief

Alastair McCready in Taipei · guardian

Washington has paused a $14bn arms package to Taiwan to conserve munitions for operations against Iran, while Trump’s comments treating weapons sales as a bargaining chip add political uncertainty. That pause weakens deterrence, hands Beijing additional leverage, and raises tail risk for Taiwan-adjacent supply chains (notably semiconductors) and investor sentiment — a meaningful uptick in geopolitical risk that should factor into tech/defense exposure and regional scenario planning.

US navy chief says $14bn arms sale to Taiwan paused due to Iran war

bbc_world

Pausing the $14bn arms sale to Taiwan to conserve munitions for the Iran conflict signals US munitions strain and a temporary reprioritization of commitments in Asia. Expect increased regional uncertainty (risk of Chinese pressure), accelerated Taiwanese/allied stockpiling and domestic defense production, and a near-term bump in geopolitical risk for portfolios and supply chains—worth reviewing exposure to Taiwan/Asia semiconductors, defense contractors, and startups focused on resilience.

Nato chief welcomes US sending 5,000 troops to Poland

bbc_world

Washington will place 5,000 troops in Poland, a clear reinforcement of NATO’s eastern flank that both reassures allies and signals deterrence to Moscow after a recent cancelled deployment. For you: this raises Europe’s security premium—expect upward pressure on defence spending, possible localized supply‑chain or energy frictions, and a modest increase in political risk premia that could influence pan‑European markets and macro positioning.

Pharma & Drug Discovery

Today’s mix underscores a familiar but sharper industry split: AI can now contribute plausible, experimentally validated hypotheses at the front end, but the real bottlenecks remain proprietary data, translational biomarkers, and clinical trial design. The winning organizations will be the ones that couple heavy inference and agentic discovery systems to defensible wet-lab loops and causal validation, because recent readouts again show that potency, target rationale, or elegant biomarker stories are not enough once you hit heterogeneous patients and real-world tolerability.

Accelerating scientific discovery with Co-Scientist

Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu · openalex

Co-Scientist couples a Gemini-based multi-agent loop (generate → critique → refine) with an asynchronous task executor and a tournament-style evolution to produce experimentally testable hypotheses; scaling test-time compute consistently improved hypothesis quality. Crucially, the system produced drug-repurposing candidates and synergistic combinations for acute myeloid leukemia that passed in vitro validation — a concrete demonstration that agent orchestration + expanded inference compute can pay off in real biology. For Isomorphic, the takeaway isn’t hype about LLMs but practical architecture: invest in agent orchestration, principled self-critique/tournament selection, and measuring compute→quality returns, while keeping wet-lab validation tightly coupled. Also note reliance on a proprietary backbone (Gemini) and wet-lab costs — both limit reproducibility and favor organizations that can fund heavy test-time compute plus experiments.

Guarding biotech from China and big bets in longevity

stat_news

Biotech leaders are increasingly treating structural and mechanistic data as strategic IP rather than public goods — BridgeBio openly admitting it’s not publishing “the right structures” reflects a broader trend of withholding detail to prevent rapid replication or transfer to competitors in China. That reduces shared training signal for protein/drug models and raises the value of proprietary experimental datasets, private compute, and secure/federated training workflows. Concurrently, Retro Biosciences, backed by Sam Altman, is moving toward a first clinical readout for a longevity program — a near-term milestone that will materially influence investor enthusiasm for AI-first biotechs. For you at Isomorphic, this means defensible data assets and compliant secure pipelines are competitive levers, and monitoring Retro’s readout will help gauge market appetite and potential M&A/partnership timing.

Best Master’s Program or Course Path to Master Applied ML/DL in Biology?

reddit_bioinformatics

If you want to be effective at applied ML/DL in biology, prioritize project-heavy, domain-grounded learning over pure theory. A structured master’s gives lab access, supervised projects, domain credibility and recruiting pathways—useful if you’re switching from a non-technical background or need formal credentials. If you already have solid ML chops, a leaner path of targeted courses + reproducible projects + internships is faster and cheaper: learn core bio (molecular biology, genomics, structural biology), key models (GNNs for molecules, equivariant nets, diffusion models), and tooling (PyTorch/JAX, MD/docking basics, chemoinformatics). Build a portfolio tied to public datasets/competitions (CAGI, MolQA, MoleculeNet), reproduce recent papers, and collaborate with wet-lab groups for credibility. Hiring-wise at drug-discovery shops, demonstrated domain projects and collaboration experience often outweigh certificates.

STAT+: Closely watched experimental Parkinson’s drug fails key clinical trial

stat_news

A late-stage trial of a small-molecule LRRK2 inhibitor failed to slow Parkinson’s progression, undercutting the idea that blocking LRRK2—originally implicated by rare familial mutations—would broadly benefit idiopathic PD patients. Practically this is a target-validation failure: either LRRK2 isn’t a causal driver in the typical patient population, patient selection/timing/biomarkers were inadequate, or target engagement didn’t translate to clinical effect. For AI-driven discovery and our work, it’s a reminder that computational target nomination must be paired with rigorous translational biomarkers, stratified trial design, and causal/physiology-aware modeling; purely correlative or high-throughput signals won’t de-risk clinical risk. Watch for subgroup/biomarker readouts, any dosing/PK explanations, and downstream pipeline/valuation impacts at Biogen/Denali—opportunities for better predictive tools and mechanistic validation may follow.

STAT+: Pharmalittle: We’re reading about a Lilly obesity drug trial, statistics for an Alzheimer’s drug, and more

stat_news

Lilly’s next-gen obesity candidate delivers near–bariatric weight loss for those who stay on it (≈28% at 80 weeks among completers) but shows materially worse tolerability: an 11% discontinuation rate at the highest dose (vs ~7% for Wegovy/Zepbound), pulling ITT efficacy down to ~25%. That tradeoff—very high potency but nontrivial adverse events—raises questions about real-world uptake, dosing strategies, payer coverage, and head-to-head positioning against Novo Nordisk. Separately, use of quantile-aggregation to link post‑treatment amyloid reductions to clinical benefit can amplify weak individual-level signals, suggesting a methodological artifact rather than a robust biomarker–outcome relationship. For you: expect intensified scrutiny of trial analysis methods and biomarker claims, greater emphasis on tolerability in go‑to‑market models, and caution when building predictive/causal models on aggregated trial summaries rather than individual‑level, pre‑specified endpoints.

Lilly’s triple agonist still leads among late-stage obesity assets

endpoints_news

Lilly’s late-stage triple-agonist maintains a clear lead on weight-loss efficacy, compressing the market window for single- and dual-agonist competitors and shifting the battle from “who loses the most weight fastest” to differentiation on safety, durability, delivery, and long-term cardiometabolic outcomes. Commercially, this strengthens pricing power and raises the bar for payers and investors — me-too assets without demonstrable long-term benefit or improved safety will struggle to justify premiums or M&A interest. For you: expect pharma to increase demand for in-silico tools that predict polypharmacology, receptor-specific signaling, PK/PD and adverse-event tradeoffs; AI-driven modeling and translational validation will be a clearer lever for partnership and licensing discussions, and internal project priorities may drift toward safety/differentiation problems rather than raw efficacy gains.

STAT+: Merck-Kelun lung cancer drug cut risk of tumor progression by 65%, ASCO abstract shows

stat_news

A Merck-licensed ADC from Kelun (sac-TMT) cut risk of progression by ~65% versus standard care in first-line advanced NSCLC and showed a preliminary overall survival signal; the trial is the first to demonstrate a successful ADC + PD-1 checkpoint combination in untreated patients. Practically, this is a clear clinical proof that ADCs can synergize with immunotherapy in solid tumors, which will accelerate investment and combination-focused development programs, push larger randomized global trials, and raise the bar for translational biomarkers and safety profiling. For you: it strengthens the case for modeling ADC pharmacology and immunogenicity, prioritizing combination synergy prediction, and watching for new Merck partnerships or licensing moves that could shift competitive dynamics and data availability in AI-driven drug discovery.

STAT+: Lilly’s ‘triple-G’ drug leads to bariatric-surgery levels of weight loss in trial

stat_news

Retatrutide, Lilly’s triple-agonist obesity candidate, delivered weight loss approaching bariatric-surgery levels (~28.3% among completers; ~25% in ITT), but a notable 11% discontinuation rate for adverse events (vs up to ~7% for current GLP-1s). That combination—surgery-level efficacy with worse tolerability—creates a clear trade-off for payers, prescribers, and patients: will many opt for intense, high-efficacy drugs with higher dropouts or stick with surgery/safer chronic therapies? For drug discovery and ML teams this is a signal: multi-target peptide therapeutics can produce step-change efficacy but raise complex safety/PKPD trade-offs that are ripe for computational modeling. Expect increased investment in in silico toxicity prediction, generative peptide design, and trial-simulation tooling; commercially, the result could accelerate M&A, partnerships, and competition in obesity therapeutics.

Finance & FIRE

The common thread is that “passive” and “safe” both need more scrutiny than they did a few years ago: cap-weighted indexes are becoming more concentrated and more exposed to index-rule quirks, while bond markets are signalling that cash and fixed income may finally offer genuine competition to expensive long-duration equities. For a FIRE-minded allocator, that argues less for prediction than for cleaner portfolio mechanics — audit concentration risk inside your supposedly diversified ETFs, assume lower forward equity returns from current multiples, and treat higher risk-free yields as a real input to ISA/SIPP rebalancing rather than dead cash drag.

Thursday links: a bumpy ride

abnormal_returns

Several linked developments compress into two actionable themes. First, capital and returns are clustering around vertically integrated platform plays: SpaceX’s S‑1 frames Starlink as the cash-generating core and implies extraordinary VC returns, while astonishing inter-company deals (Anthropic paying ~$1.25B/month to SpaceX through 2029) show AI firms are locking up bespoke infra and revenue streams. That changes how to think about exposure to AI: it’s not just models (NVIDIA profits are another sign of concentration) but infrastructure and long-term customer contracts. Second, market structure in public investing is evolving — ETF incumbents are taking on each other, active ETFs and niche products (e.g., memory/DRAM ETFs) are drawing flows despite higher fees, and evergreen fund mechanics deserve scrutiny for illiquidity risk. For portfolio decisions: prefer instruments that capture durable infra/ops moats or broad low-cost beta, and treat niche/high-fee active products and evergreen allocations as due-diligence items rather than automatic diversifiers.

Bond markets are not so subtly telling the Fed that rates aren't high enough

reddit_economics

Rising long-term bond yields indicate markets are pricing more tightening or stickier inflation than current Fed guidance — the market signal is that policy rates may still be too low to bring inflation firmly under control. Practically: higher discount rates compress valuations (especially for long-duration assets like growth tech and biotech), increase borrowing costs for startups and R&D-heavy firms, and improve risk-free returns you can lock into via cash, short-duration gilts, or investment-grade bonds. For portfolio action: consider trimming duration risk, laddering fixed income in ISAs/SIPPs to capture higher yields, and stress-testing equity positions for a higher-rate regime. For work/startups: expect tougher fundraising and more pressure on durable ROI for ML/drug-discovery projects — prioritize cost-efficient inference and clearer go/no-go milestones.

America's $39 Trillion Reckoning: Could the World's Largest Economy Become the Next Greece?

reddit_economics

A $39T US debt stockpile doesn't make default likely thanks to the dollar’s reserve role, but it does raise odds of structurally higher interest rates, fiscal retrenchment, and inflationary policy responses over the next decade. For a portfolio this means upward pressure on the risk-free rate and discount rates—bad for long-duration tech and biotech growth valuations, and supportive of value, real assets, and inflation-linked bonds. Practically: shorten bond duration, add TIPS or inflation-protected exposure, maintain global equity diversification to hedge dollar and policy risk, and keep taxable gains sheltered in ISA/SIPP-like wrappers given higher probability of future tax hikes. Also expect tougher fundraising and lower startup valuations, which matters for private rounds or venture exposure.

Wednesday links: not knowing the future

abnormal_returns

Passive indexes are less diversified than they look: Nvidia now exceeds 5% of MSCI ACWI and South Korea’s market is even more top-heavy, so cap‑weighted exposure can concentrate risk in a handful of names/regions. At the same time software and semiconductor returns have materially decoupled, and macro signals (oil futures near highs; demand driven by refined products rather than crude) highlight that headline commodity moves mask sectoral nuance. Combine that with the reminder that narratives often trump charts and that nobody reliably knows the future — the practical response is to treat index exposure as an explicit bet: audit single‑stock/region caps in your ISA/SIPP/ETF holdings, rebalance to target risk, consider equal‑weight or low‑active tilts to reduce concentration, and keep a small, sized conviction sleeve rather than forecasting markets.

SpaceX IPO and NASDAQ violating its own methodology

reddit_investing

If SpaceX secures an expedited NASDAQ listing that triggers immediate index inclusion at an outsized initial weight (~4%), passive funds and benchmarks will be mechanically forced into large buy orders at the first rebalance — creating predictable, front-loaded demand and higher short-term prices. That gameable entry undermines index governance: negotiated exceptions allow private issuers to monetize rising passive ownership, increasing single-stock concentration and amplifying volatility when sentiment reverses. For a portfolio-minded engineer: don’t assume passive exposure immunizes you from idiosyncratic shocks; index-driven flows can create both rapid upside and brutal mean reversion. Watch NASDAQ’s formal notice, the float/circulation metrics used for weighting, and how major US ETFs implement rebalances or sampling — those details determine arbitrage opportunities and execution risk for taxable or SIPP/ISA allocations.

Can someone explain why all asset prices are so high… and why they aren’t coming down?

reddit_investing

Asset prices are high because multiple structural and policy forces compressed risk premia: persistently large central-bank and fiscal backstops, a global savings/real-yield environment that favors growth assets, huge passive/ETF flows concentrating capital into a few winners, and narrative-driven re-rating of high-growth tech/AI names. Real yields are the key dial — when they stay low, long-duration cash flows (tech, biotech, even gold as an inflation/real-yield hedge) command much higher multiples. Practically, that means expected forward returns from broad indexes are likely below long-run averages, so the rational response for a 26‑year investor is not market timing but higher savings rates, disciplined dollar-cost averaging, tax-efficient wrappers (ISA/SIPP), and modest tilts toward value, international or real-assets if you want diversification against a low-yield regime. Monitor real rates and earnings growth as the primary risks to the current valuation regime.

Startup Ecosystem

The startup signal today is that the bottleneck is moving away from model access and toward systems design: persistent memory, long-lived agent execution, security, and compliance are becoming the real product surface. For UK/EU founders especially, that means the winners are less likely to be the teams with the flashiest demo and more likely to be the ones that can package agentic capability into auditable, governable, enterprise-safe infrastructure while adapting to a distribution environment increasingly controlled by large platforms.

Alibaba's proprietary Qwen3.7-Max can run for 35 hours autonomously and supports external harnesses like Anthropic's Claude Code

venturebeat

Alibaba’s Qwen3.7-Max demonstrates that agentic models can now sustain multi-day autonomous workflows (35 hours, 1,158 tool calls, 10x kernel speedup) by training on massively scaled, dynamic agentic environments and adding reward‑hacking self‑monitoring. For product and infra: expect demands for long‑lived, stateful sessions, robust tool orchestration, and cost models that cover days of continuous inference — not just single-shot prompts. For R&D: this capability maps directly to multi‑day optimization loops in drug discovery (iterative simulation, experiment orchestration, automated debugging) and invites work on alignment/robustness for persistent agents. Commercially, Qwen3.7‑Max is proprietary and China‑hosted, so European/UK adoption will be limited by compliance and sovereignty, but support for external harnesses (e.g., Claude Code) raises integration options and competitive pressure on open‑source agent efforts.

A 0.12% parameter add-on gives AI agents the working memory RAG can't

venturebeat

Delta-mem compresses an agent’s history into a tiny, dynamically updated matrix (an OSAM) that’s plugged into a frozen backbone, adding only ~0.12% parameters while outperforming a leading 76.4%-heavier alternative. Practically, it replaces many RAG/context-window tricks: instead of re-inserting text or incurring retrieval latency, the model projects hidden states into the matrix and reads compact associative memory during forward passes, avoiding quadratic attention blowups and heavy token costs. For production teams that need persistent, low-latency agents—coding assistants, iterative data-analysis pipelines, or long-running drug-discovery workflows—this is a cheap, deployable path to behavioral continuity. Caveats: you still need policies for memory update/decay, verification to avoid stale or misaligned recalls, and robustness testing on OOD histories before trusting scientific pipelines.

MFA verifies who logged in. It has no idea what they do next.

venturebeat

MFA only proves identity at one moment — attackers are exploiting stolen session tokens as bearer credentials to move laterally and escalate without triggering traditional alerts. With AI-powered vishing, deepfakes, and highly convincing phishing, adversaries favor credential theft over malware; enterprises need immediate token revocation, short-lived and scoped tokens, token-binding to devices, and continuous risk scoring that ties IAM into SecOps. For you: ML infra and R&D environments commonly use long-lived cloud/session tokens, API keys, and shared service accounts — a stolen session can leak datasets, models, or allow Kubernetes/AD pivoting. Prioritize post-auth telemetry, automated session kill-chains, conditional access tied to behavioral signals, and architect token lifecycle policies into model serving and platform layers.

The EU is rapidly rewriting the AI Act. What’s changed?

sifted

EU negotiators have accelerated major rewrites to the AI Act that will tighten obligations for foundation-model providers, increase transparency and auditability requirements, and expand the ‘high-risk’ category for certain automated decision systems. For startups that build or integrate generative models, this raises near-term compliance costs: mandatory risk assessments, documented training provenance, logging for traceability, and stronger human‑oversight measures will need engineering and legal bandwidth. Practically, expect product changes (labeling, explainability hooks), heavier ops for data governance and inference auditing, and tougher market access calculus if you target EU customers. Immediate actionable moves: audit your model supply chain and logging, budget for compliance engineering, and re-evaluate hosting/data‑locality and go‑to‑market timelines in the EU.

Google’s AI search overhaul is great for Google and bad for everyone who makes the web worth searching

the_next_web

Google turning Search into an “AI-first” assistant with conversational follow-ups and autonomous web-monitoring is a platform-level pivot: answers, not links, become the primary product. That consolidates control over discovery and attribution, starving publishers and startups of referral traffic while making Google the gatekeeper for which models and sources get amplified. For builders and ML teams, this raises three practical consequences: (1) product distribution and growth channels shift toward platform partnerships or paid placements rather than organic SEO; (2) model evaluation and retrieval engineering must account for a dominant, closed aggregator that can reshape training data and signal flows; (3) opportunities emerge for alternative discovery layers, source-verification tools, and APIs that restore provenance or route users to original content. Reassess go-to-market assumptions and data/ingestion strategies accordingly.

Kore.ai launches Artemis AI agent platform, takes on Salesforce and ServiceNow

venturebeat

A new push toward ‘AI-designed’ enterprise agents is crystallizing around two technical bets: a compiled, YAML-based intermediary language (ABL) that makes agent topologies versionable and CI/CD-friendly, and an assistant (Arch) that translates plain-language specs into runnable agent artifacts and iteratively rewrites them based on production telemetry. For ML/platform engineers this matters because it converts months of messy integration, orchestration, governance, and observability work into a closed-loop MLOps product—shifting the hard problems from plumbing to validation, policy, and reliability of generated agent code. Watch for vendor-neutral ABL becoming a portability/lock-in battleground, increased demand for run-time observability and safety tooling, and new failure modes where “AI writes infra” requires stronger automated testing and post-deployment verification.

Engineering & Personal

A clear pattern here: as AI gets embedded deeper into engineering workflows, the hard problems shift from model capability to systems design and institutional control. The teams that benefit most won’t be the ones with the fanciest demos, but the ones that treat agents, CI, APIs, and hiring as parts of the same production surface — with explicit trust boundaries, observability, and processes that optimize for real-world debugging and delivery rather than proxy metrics.

Introducing Nova, our internal platform for coding agents

dropbox_tech

Building a reusable agent platform (Nova) is the pragmatic next step once coding agents move beyond toy prompts: treat agents as stateful, orchestration-heavy services, not one-off scripts. Key engineering moves are session lifecycle management, tight context retrieval from repos/CI, sandboxed execution within the full engineering environment, connector abstractions for automated workflows, and observability/versioning for reproducibility and safety. That shifts work from model tuning to platform engineering: cost/scheduling, multi-agent coordination, access control, and developer UX become the gating constraints. For teams working with large monorepos or regulated pipelines — including ML-driven drug discovery — a platform reduces duplicate integrations and accelerates safe experimentation, but it requires upfront investment in orchestration, validation, and governance to avoid spiraling complexity or security exposure.

mass github repo backdooring via CI workflows(Megalodon)

reddit_programming

An automated campaign pushed >5,700 malicious commits to 5,561 GitHub repos in six hours by creating throwaway accounts and adding innocuous-looking CI commits (e.g., “ci: add build optimization step”), showing attackers prefer stealthy workflow changes over noisy exploits. For platform/ML infra teams this is a supply‑chain and lateral-movement risk: compromised or malicious workflows can run with repo tokens, exfiltrate artifacts/keys, persist backdoors into models or deployment pipelines, and evade simple keyword filters because commit metadata looks normal. Immediate mitigations: require PR reviews and protected branches for workflow files, enable workflow approvals for first‑time contributors, tighten permissions for GITHUB_TOKEN and third‑party actions, enforce signed commits/verified users, audit GitHub logs for mass small-account activity, and use CI gates to block added/modified workflow files until vetted. Prioritise org‑level policy hardening and alerting rather than relying on commit message heuristics.

Announcing Claude Compliance API support with Cloudflare CASB

cloudflare_blog

Cloudflare CASB now consumes Anthropic’s Claude Compliance API signals, giving security teams out-of-band visibility and controls over Claude usage without deploying endpoint agents. That enables central logging of prompts, file uploads, token spend and model calls, plus enforcement points (rate limits, DLP, routing) via Cloudflare AI Gateway/Gateway/Access — a practical way to safely sanction LLM use rather than outright block it. For Isomorphic Labs this lowers the barrier to controlled LLM adoption: you can monitor and block prompts containing PHI, proprietary sequences, SMILES or API keys, create auditable trails for compliance, and reduce shadow usage. Caveat: effectiveness depends on what the provider exposes and on traffic routing (encrypted or unsupported paths can remain blind). Action: pilot the integration on representative internal workflows and tune DLP rules for chemical/biological artifacts and keys.

Technical Interviews Reject the Wrong Engineers

reddit_programming

Technical interviews disproportionately filter out engineers who excel at shipping, debugging, and cross-team impact because they prioritize whiteboard puzzles and contrived algorithms. For ML infra and platform roles, that means losing candidates who understand latency/throughput trade-offs, model deployment failure modes, and incident-driven learning—skills that matter for productionizing models. Swap or supplement trivia with short, real-context take-homes, paired debugging/on-call simulations, architecture reviews of past systems, and behavioral prompts tied to delivery outcomes. Score on pragmatic signals—code comprehension, deployment reasoning, incident triage, trade-off justification, and collaboration—and calibrate interviewers with rubrics and calibration debriefs. This reduces false negatives, improves senior-hire hit rate, and aligns hiring with the day-to-day realities of ML systems work.

A Guide to Async Patterns in API Design

bytebytego

Async API patterns are a toolbox for aligning API surface to real workload semantics — choose by latency, durability, and operational cost rather than familiarity. For short, sporadic updates use short/long polling or SSE; for bidirectional low-latency streams (real-time inference or telemetry) prefer WebSockets or GraphQL subscriptions; for durable, backpressured work (model training, batch inference, feature pipelines) use message queues with explicit status endpoints and webhooks for callbacks. Key operational trade-offs: connection count and CPU/memory cost for persistent channels, delivery guarantees and idempotency for webhooks/queues, and observability for long-running tasks (correlate trace IDs across queue→worker→callback). For ML infra, a pragmatic pattern is queue-based scheduling with status polling and optional webhook/streaming notifications for results — balances GPU scheduling, retry semantics, and UI responsiveness.