the brief

Agent workflows took center stage: Anthropic shipped security guidance inside Claude Code, Microsoft introduced Webwright to turn Playwright sessions into reusable agent workflows, and Vercel’s Agent Marketplace added Firecrawl. Benchmarks and safety kept pace with DeepSWE for agentic coding and Anthropic’s sandboxing guidance, while a Copilot Cowork exfiltration demo underscored why permissions matter.

the poursit · sip · 11 items

alerts

(01)

pulse

(07)
  • @ClaudeDevsMay 26, 09:24 PM

    Claude Code adds security guidance

    New plugin scans edits, turns, and commits for risky patterns, enforces org security rules, and reportedly cuts security review comments by ~30–40% in internal use

    We’ve shipped a security-guidance plugin for Claude Code that helps identify and fix vulnerabilities as you’re writing code. Available for all Claude Code users. Install from the plugin marketplace (/plugins). pic.x.com/LprgC4m6Kf

  • anthropics/claude-code· feedMay 27, 01:30 AM

    Claude Code v2.1.152 ships updates

    Adds /code-review --fix to apply suggestions automatically, a smarter /simplify, skill-level disallowed-tools, /reload-skills, and SessionStart hooks that can trigger skill rescans

    v2.1.152 — What's changed /code-review --fix now applies review findings to your working tree after the review, surfacing reuse, simplification, and efficiency suggestions; /simplify now invokes /code-review --fix Skills and slash commands can now set disallowed-tools in frontmatter to remove tools from the model while the skill is active Added /reload-skills command to re-scan skill directories without restarting the session SessionStart hooks can now return reloadSkills: true to re-scan ski...

    signal 9hype 1release_notesclaude_codedev_toolingsource ↗
  • @mr_r0b0tMay 25, 09:37 PM

    Microsoft ships Webwright for agents

    Playwright’s new agent-focused workflows turn browser sessions into reusable recipes, accelerating autonomous web automation; repo even includes a Hermes Agent skill out‑of‑the‑box

    Microsoft dropping a massive Playwright update geared specifically for agents, Webwright! This is an absolute game changer for agentic browser use as every session becomes a reusable workflow The repo includes a @NousResearch Hermes Agent skill 😍 microsoft.github.io/Webwright/ pic.x.com/rwlKmbHPnR

    signal 7hype 5playwrightagentsbrowser_automationsource ↗
  • @vercel_devMay 26, 07:43 PM

    Firecrawl lands on Vercel Agents

    Give agents first‑class, structured web scraping, search, and dynamic interaction directly inside Vercel’s Agent Marketplace, reducing glue code for production LLM apps

    Firecrawl is now available on Agent Marketplace. Give your agents and AI apps access to reliable, structured web data directly from Vercel, including scraping, search, and interaction with dynamic websites. vercel.com/changelog/fire…

    signal 6hype 2vercelagent_marketplaceintegrationsource ↗
  • @mattpocockukMay 26, 03:53 PM

    Sandcastle 0.6.1 ships structured output

    The agent framework adds Output.object for typed returns plus CLI support for Cursor and GitHub Copilot, smoothing integration and testability of agent workflows

    Monster day on Sandcastle today: - Agents can now return structured output via Output.object - Added support for @cursor_ai CLI - Added support for @github Copilot CLI - Fixed a metric ton of bugfixes Check out 0.6.1

    signal 7hype 2agent_frameworkrelease_notesstructured_outputsource ↗
  • @testingcatalogMay 27, 12:15 AM

    iOS app controls self-hosted agent

    Atomic Bot’s Hermes Agent now has an iOS controller to manage a 24/7 VPS agent via Tailscale, Cloudflare Tunnel, or ngrok—open source and privacy‑first

    Atomic Bot released an iOS app for Hermes Agent, bringing mobile control to a self-hosted agent running 24/7 on your own VPS. > Remote access via Tailscale, Cloudflare, or ngrok > Open source and private by default pic.x.com/BN27TrhWU4 x.com/atomicbot_ai/s…

    signal 6hype 1agent_frameworkios_appself_hostedsource ↗
  • @unknownMay 26, 07:56 PM

    Marlin-2B, tiny open video VLM

    An Apache‑2.0 2B‑parameter model that timestamps events and locates moments in video by natural language, enabling lightweight, on‑device video understanding

    cool new release: a tiny open video VLM that understands what happens in videos and when 👀 Marlin-2B (Apache 2.0!) can caption clips into timestamped events, or find a natural-language moment inside the video (can see a ton of cool use cases with it) Made a Hugging Face demo pic.x.com/nFkwkGuOZ7

    signal 6hype 3model_releasevideo_vlmopen_sourcesource ↗

findings

(02)
  • @unknownMay 26, 04:18 PM

    DeepSWE benchmarks agentic coding gaps

    New benchmark surfaces where top coding models truly diverge on real developer workflows, offering a sharper lens than public leaderboards for agent performance

    Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work. pic.x.com/HCDcjNuTFK

    signal 8hype 2benchmarkagentic_codingevaluationsource ↗
  • @AnthropicAIMay 26, 07:09 PM

    Sandboxing as evolving agent permissioning

    Anthropic details how to constrain capable agents with scoped sandboxes and granular tool access, aligning permissions with model ability to limit destructive actions

    New on the Engineering Blog: The access and permissions we grant agents should evolve with their capabilities. In our own products, we set these parameters through sandboxing, which limits the scope of any potentially destructive actions. Read more: anthropic.com/engineering/ho…

    signal 7hype 1agentssandboxingagent_permissionssource ↗

voices

(01)
  • @unknownMay 26, 08:32 PM

    Learning curve pays off with GPT‑5.5

    After two months refining prompts and agents.md, this developer says 5.5 is now unmatched for coding—anecdotal but aligned with rising agent‑centric workflows

    It took me like 2 months, but I've grown to love gpt-5.5. You have to prompt entirely different and put some time into your agents[.]md. Now that I'm over the hump, I can't really use any other model for code.

    signal 4hype 3model_opinionpromptingagentssource ↗
Cuppa · Wednesday, May 27, 2026 · Cuppa