cuppa

today's signal · no scroll

live

brewed 03:27 AM

← previous← May 26

Wednesday

may27

2026

next →May 28 →

the brief

Agent workflows took center stage: Anthropic shipped security guidance inside Claude Code, Microsoft introduced Webwright to turn Playwright sessions into reusable agent workflows, and Vercel’s Agent Marketplace added Firecrawl. Benchmarks and safety kept pace with DeepSWE for agentic coding and Anthropic’s sandboxing guidance, while a Copilot Cowork exfiltration demo underscored why permissions matter.

the poursit · sip · 11 items

alerts

(01)

simonw/blog· AnalysisMay 26, 03:36 PM
Copilot Cowork enables data exfiltration
PromptArmor shows Microsoft’s Copilot Cowork can be steered to leak files across connected tools—agent builders should harden scopes, audit actions, and isolate sensitive data paths
Microsoft Copilot Cowork Exfiltrates Files — <a href="https://www.promptarmor.com/resources/microsoft-copilot-cowork-exfiltrates-files">Microsoft Copilot Cowork Exfiltrates Files</a> The biggest challenge in designing agentic systems continues to be preventing them from enabling attackers to exfiltrate data. In this case Microsoft Copilot Cowork (yes, that's <a href="https://www.microsoft.com/en-us/microsoft-365/blog/2026/03/09/copilot-cowork-a-new-way-of-gettin...
signal 8hype 1securityagent_safetydata_exfiltrationtechnicalsource ↗

pulse

(07)

@ClaudeDevs· XMay 26, 09:24 PM
Claude Code adds security guidance
New plugin scans edits, turns, and commits for risky patterns, enforces org security rules, and reportedly cuts security review comments by ~30–40% in internal use
We’ve shipped a security-guidance plugin for Claude Code that helps identify and fix vulnerabilities as you’re writing code. Available for all Claude Code users. Install from the plugin marketplace (/plugins). pic.x.com/LprgC4m6Kf
signal 8hype 2claude_codepluginsecuritylaunchsource ↗
anthropics/claude-code· First-partyMay 27, 01:30 AM
Claude Code v2.1.152 ships updates
Adds /code-review --fix to apply suggestions automatically, a smarter /simplify, skill-level disallowed-tools, /reload-skills, and SessionStart hooks that can trigger skill rescans
v2.1.152 — What's changed /code-review --fix now applies review findings to your working tree after the review, surfacing reuse, simplification, and efficiency suggestions; /simplify now invokes /code-review --fix Skills and slash commands can now set disallowed-tools in frontmatter to remove tools from the model while the skill is active Added /reload-skills command to re-scan skill directories without restarting the session SessionStart hooks can now return reloadSkills: true to re-scan ski...
signal 9hype 1release_notesclaude_codedev_toolinglaunchsource ↗
@mr_r0b0t· XMay 25, 09:37 PM
Microsoft ships Webwright for agents
Playwright’s new agent-focused workflows turn browser sessions into reusable recipes, accelerating autonomous web automation; repo even includes a Hermes Agent skill out‑of‑the‑box
Microsoft dropping a massive Playwright update geared specifically for agents, Webwright! This is an absolute game changer for agentic browser use as every session becomes a reusable workflow The repo includes a @NousResearch Hermes Agent skill 😍 microsoft.github.io/Webwright/ pic.x.com/rwlKmbHPnR
signal 7hype 5playwrightagentsbrowser_automationlaunchsource ↗
@vercel_dev· XMay 26, 07:43 PM
Firecrawl lands on Vercel Agents
Give agents first‑class, structured web scraping, search, and dynamic interaction directly inside Vercel’s Agent Marketplace, reducing glue code for production LLM apps
Firecrawl is now available on Agent Marketplace. Give your agents and AI apps access to reliable, structured web data directly from Vercel, including scraping, search, and interaction with dynamic websites. vercel.com/changelog/fire…
signal 6hype 2vercelagent_marketplaceintegrationlaunchsource ↗
@mattpocockuk· XMay 26, 03:53 PM
Sandcastle 0.6.1 ships structured output
The agent framework adds Output.object for typed returns plus CLI support for Cursor and GitHub Copilot, smoothing integration and testability of agent workflows
Monster day on Sandcastle today: - Agents can now return structured output via Output.object - Added support for @cursor_ai CLI - Added support for @github Copilot CLI - Fixed a metric ton of bugfixes Check out 0.6.1
signal 7hype 2agent_frameworkrelease_notesstructured_outputlaunchsource ↗
@testingcatalog· XMay 27, 12:15 AM
iOS app controls self-hosted agent
Atomic Bot’s Hermes Agent now has an iOS controller to manage a 24/7 VPS agent via Tailscale, Cloudflare Tunnel, or ngrok—open source and privacy‑first
Atomic Bot released an iOS app for Hermes Agent, bringing mobile control to a self-hosted agent running 24/7 on your own VPS. > Remote access via Tailscale, Cloudflare, or ngrok > Open source and private by default pic.x.com/BN27TrhWU4 x.com/atomicbot_ai/s…
signal 6hype 1agent_frameworkios_appself_hostedlaunchsource ↗
@unknown· XMay 26, 07:56 PM
Marlin-2B, tiny open video VLM
An Apache‑2.0 2B‑parameter model that timestamps events and locates moments in video by natural language, enabling lightweight, on‑device video understanding
cool new release: a tiny open video VLM that understands what happens in videos and when 👀 Marlin-2B (Apache 2.0!) can caption clips into timestamped events, or find a natural-language moment inside the video (can see a ton of cool use cases with it) Made a Hugging Face demo pic.x.com/nFkwkGuOZ7
signal 6hype 3model_releasevideo_vlmopen_sourcelaunchsource ↗

findings

(02)

@unknown· XMay 26, 04:18 PM
DeepSWE benchmarks agentic coding gaps
New benchmark surfaces where top coding models truly diverge on real developer workflows, offering a sharper lens than public leaderboards for agent performance
Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work. pic.x.com/HCDcjNuTFK
signal 8hype 2benchmarkagentic_codingevaluationlaunchsource ↗
@AnthropicAI· XMay 26, 07:09 PM
Sandboxing as evolving agent permissioning
Anthropic details how to constrain capable agents with scoped sandboxes and granular tool access, aligning permissions with model ability to limit destructive actions
New on the Engineering Blog: The access and permissions we grant agents should evolve with their capabilities. In our own products, we set these parameters through sandboxing, which limits the scope of any potentially destructive actions. Read more: anthropic.com/engineering/ho…
signal 7hype 1agentssandboxingagent_permissionstechnicalsource ↗

voices

(01)

@unknown· XMay 26, 08:32 PM
Learning curve pays off with GPT‑5.5
After two months refining prompts and agents.md, this developer says 5.5 is now unmatched for coding—anecdotal but aligned with rising agent‑centric workflows
It took me like 2 months, but I've grown to love gpt-5.5. You have to prompt entirely different and put some time into your agents[.]md. Now that I'm over the hump, I can't really use any other model for code.
signal 4hype 3model_opinionpromptingagentsculturalsource ↗

may27

Copilot Cowork enables data exfiltration

Claude Code adds security guidance

Claude Code v2.1.152 ships updates

Microsoft ships Webwright for agents

Firecrawl lands on Vercel Agents

Sandcastle 0.6.1 ships structured output

iOS app controls self-hosted agent

Marlin-2B, tiny open video VLM

DeepSWE benchmarks agentic coding gaps

Sandboxing as evolving agent permissioning

Learning curve pays off with GPT‑5.5