The Last Mile of Headless Mac Agents Is the Screen

Most of my agent work now happens on machines I am not sitting in front of.

Nix runs on a NAS. Max runs on a Mac Mini. Claude Code can ship changes there. Codex can inspect repos there. The whole setup is designed around the idea that the machine doing the work does not have to be the machine in my lap.

For most tasks, SSH is enough.

Build a Swift package. Run tests. Restart a LaunchAgent. Tail logs. Push a branch. Query a local database. All of that is solved from a phone, an iPad, or another machine. The terminal is old, boring, and exactly right for it.

But the terminal has a hard edge.

macOS still has moments where the system wants a human at a screen. Screen Recording permission. Accessibility permission. A keychain prompt. An Apple developer login. Xcode signing state. A simulator that needs one native UI interaction before the automation path works again.

That is where the headless story breaks.

Not because the agent cannot write the code. Because the Mac still occasionally needs a hand on the glass.

#The Annoying Part Was Not SSH

I have been using the Mac Mini as a remote build and automation box for a while. It is the right machine for the job: always on, physically in the Apple ecosystem, able to run Xcode, simulators, browser profiles, and local desktop automation.

The happy path feels great. I can ask an agent to make a change to an iOS app, build it on the Mac Mini, and install it on a device without opening my laptop.

Then something expires.

The Apple developer account wants a re-login. Xcode wants signing fixed. macOS asks for a permission grant. A simulator is stuck in a state that is obvious visually but awkward to diagnose from logs.

At that point, I usually have three options:

Get the MacBook.
Use native screen sharing or VNC from another desktop.
Open a third-party remote desktop app on the iPad or iPhone and hope it does not feel sluggish, gated, or weirdly priced for a thing I only need in short bursts.

None of these are impossible. They are just enough friction to break the flow.

I did not want to reinvent remote desktop. I wanted the boring version I could trust inside my own network: low-latency screen sharing to the Mac Mini, native on iPad and iPhone, over LAN or Tailscale, with no public relay.

That became Mirador.

#What Mirador Is

Mirador is a small remote screen-sharing system for macOS.

The Mac host captures the screen with ScreenCaptureKit, hardware-encodes it with VideoToolbox H.264, streams it over a WebSocket, and accepts input events back over another WebSocket. The iOS app is a native SwiftUI viewer for iPhone and iPad.

The project is open source here: github.com/arniesaha/mirador.

The architecture is intentionally plain:

ScreenCaptureKit
  -> VideoToolbox H.264
  -> /ws/video
  -> iPhone / iPad native decoder

iPhone / iPad input
  -> /ws/input
  -> CGEvent injection on the Mac

There is also a browser viewer with WebCodecs and an MJPEG fallback, but the native app is the part I care about most. The point is not to build another general-purpose remote desktop product. The point is to make my Mac reachable from the devices I actually have in my hand when an agent workflow hits a UI wall.

The security model is deliberately constrained:

LAN and Tailscale first.
Capability token on every endpoint.
No public relay.
No attempt to make this a hosted product.

Input injection is real control of a Mac. I do not want that casually exposed to the internet.

#The First Version Was the Easy Part

The first version was mostly plumbing.

Can the Mac host run as a LaunchAgent? Can it bind on a local port? Can it stream frames? Can the viewer connect? Can the input path send a click and get an acknowledgement?

That got working quickly enough.

Then the actual product work began.

Remote desktop is not useful because pixels arrive. It is useful when the interaction feels direct enough that you stop thinking about the transport.

On the Mac host, that meant a few non-obvious details:

Do not start capture or encoding until a viewer connects.
Tear everything down when the last viewer disconnects.
Keep idle memory low enough that the service can stay resident.
Keep the display awake while streaming.
Wake the display when a viewer connects.
In one case, nudge the cursor to reliably wake the display.

That last one is the kind of detail that makes a project feel less like an architecture diagram and more like a real tool. The elegant solution is "the display wakes when capture starts." The practical solution is sometimes "move the cursor enough for macOS to notice a human might be here."

I have learned to respect that layer.

#H.264 Changed the Feel

The early MJPEG path was good enough to prove the shape, but it was not the right long-term default. MJPEG is simple, debuggable, and expensive. Under motion it used far more bandwidth than it needed to, and the browser path was doing more work than necessary.

The H.264 path made the project feel real.

The measured loopback baseline after the VideoToolbox work was roughly:

1080p at about 30 fps.
Around 10 Mbit/s under synthetic motion.
Around 6.5x lighter than the MJPEG path at equal fps and resolution.
Frame receive age in the low milliseconds on loopback.
Encode latency around 8 ms per frame.
Idle service memory around 12 MB when nobody is connected.

The bigger change was subjective. On iPad, it stopped feeling like I was watching a periodically refreshed screenshot and started feeling like I had a usable remote screen.

That matters because the use case is not "watch the Mac." The use case is "clear the one UI blocker that stopped the agent workflow."

Latency compounds frustration. If the screen is sluggish, I will avoid using it and go back to the laptop. If the screen feels direct, the remote Mac stays part of the agent loop.

#The iPhone Was a Different Input Problem

The iPad can behave like a small laptop. The Magic Keyboard trackpad gives you a pointer. Two-finger scrolling maps naturally. The screen is large enough that absolute touch works reasonably well.

The iPhone is different.

A direct tap model is clumsy when the target screen is a full Mac desktop compressed into a phone. I needed a virtual cursor, pinch-to-zoom, and a software keyboard that could still send real Mac-style key events.

That led to a split input model:

iPad keeps the larger-screen absolute model.
iPhone uses a relative virtual cursor.
Pinch-to-zoom changes the visual scale without breaking coordinate mapping.
A key bar exposes the keys iOS hides: Esc, Tab, arrows, and sticky Command/Control/Option modifiers.
The software keyboard can type text, but modifier chords can still become proper keyDown/keyUp events.

This is where the project stopped being just "VNC, but mine."

The hard part was not screen capture. It was making remote control usable on the device in my pocket.

That is also the agent lesson.

Agents do not only need compute. They need escape hatches into the physical and OS-specific parts of the world. Sometimes that escape hatch is a shell. Sometimes it is a browser. Sometimes it is a native iPhone keyboard controlling a Mac Mini over Tailscale because Xcode wants a login.

#Tailscale Exposed a Nice Apple-Specific Trap

The native app connects to the Mac over plain ws://. That is intentional. On a trusted LAN or Tailscale network, I do not need a public HTTPS hop between my iPad and the Mac Mini. The browser viewer has different constraints because WebCodecs requires a secure context, so HTTPS matters there.

For the native app, the trap was App Transport Security.

I had the app configured to allow arbitrary loads, but adding NSAllowsLocalNetworking changed the behavior in a subtle way: iOS treated local networking as the exemption and stopped applying the broader arbitrary-load allowance. A LAN address worked. A Tailscale address did not, because Tailscale sits in the 100.64.0.0/10 range and iOS does not treat that as local RFC-1918 networking.

The fix was small: keep the local network privacy description, but do not add NSAllowsLocalNetworking. Let the prototype use NSAllowsArbitraryLoads for the LAN/Tailscale ws:// path.

This is exactly the kind of thing I want written down in the repo because future me will absolutely forget it.

#Why This Belongs in the Agent Stack

Mirador is not an AI project in the obvious sense. There is no model in the loop. No prompt. No tool-calling schema.

But it exists because of agents.

Once you start running real coding agents on remote machines, the bottleneck moves. At first, the bottleneck is "can the agent edit files?" Then it is "can it run tests?" Then it is "can it deploy?" Then, eventually, it is something much more mundane:

Can I complete the one GUI step the operating system refuses to expose cleanly through automation?

That step matters more than it looks.

If I have to get up, find the MacBook, switch contexts, open screen sharing, authenticate, click one permission dialog, and then come back to the conversation, the agent workflow has already lost momentum.

If I can open a native app on the iPad, connect over Tailscale, grant the permission, and return to the agent thread, the workflow stays intact.

That is the difference Mirador is trying to make.

Not replacing SSH.

Completing it.

#What I Would Not Claim Yet

Mirador is still experimental.

It is built for my LAN and Tailscale setup. It is token-protected, but it is not a hardened remote access product. It does not have a relay, multi-user authorization, audit logs, enterprise policy, or any of the things you would want before exposing something like this broadly.

The iOS client is also where most of the polish work lives. Remote input is a pile of edge cases: hardware keyboards, software keyboards, trackpads, indirect scroll events, modifier keys, zoom transforms, and the difference between "this gesture works" and "this gesture feels right."

I am fine with that.

The project has a clear job. Let me control the Mac Mini from the devices I actually use, with enough latency and input fidelity that it stays in the agent workflow instead of becoming another chore.

#The Takeaway

Headless agents are not purely headless in practice.

The more useful they become, the more they run into the boundaries of the machines they operate. Permissions. Signing. Browser sessions. Native UI state. Desktop automation. The parts of computing that never became clean APIs.

Mirador is my answer to one slice of that problem: a native, low-latency screen and input path back into the Mac.

The terminal got me most of the way there.

The last mile needed a screen.