Two model families. LoRA training. Image-to-image. Custom styles.
mere.run image "a portrait in oil paint style"
Images. Video. 3D. Audio. Chat. Just enough. Nothing more.
Every image generated locally on Apple Silicon. No cloud. No waiting.
Images, video, music, voice, chat, code, OCR, training — everything runs locally on Apple Silicon.
Two model families. LoRA training. Image-to-image. Custom styles.
mere.run image "a portrait in oil paint style"
Text-to-video and image-to-video. Native LTX pipeline on Metal.
mere.run video "timelapse of clouds over peaks"
Generate tracks, remix, cover vocals. ACE-Step with lyrics and BPM.
mere.run music --lyrics "verse one..." --bpm 120
122B-parameter LLM running locally. Code generation. Multi-turn.
mere.run chat "explain how diffusion models work"
Text-to-speech with voice cloning. Save and reuse voice profiles.
mere.run talk "Hello world" --clone voice.wav
Transcribe and translate audio. Streaming ASR. Timestamps.
mere.run listen recording.wav --timestamps
Fine-tune on your own images. Style or subject. Resume from checkpoint.
mere.run image train-lora ./my-photos --trigger "sks"
Describe images. Extract text. Caption datasets for training.
mere.run look photo.jpg "what is this?"
OpenAI-compatible endpoint. Drop into Cursor, VS Code, any client.
mere.run serve --engine q35 --port 8080
Every capability at your fingertips. One command away.
Mere is a first-class tool for AI agents. Claude Code, Cursor, and any OpenAI-compatible client can invoke every capability — image generation, video, music, TTS, chat — through a single CLI or API endpoint. Your Mac becomes an AI compute layer that both you and your agents share.
Mere is a single Swift CLI. Every Mac app — yours or one your agent vibe-codes mid-conversation — can drop it in via Process. Spark, below, is a real SwiftUI app. Claude wrote it in one prompt.
empty state · resolves mere.run on $PATH
image-klein-nano · 1024×768 · on this Mac
import Foundation import AppKit enum MereClient { static func generate(prompt: String) async throws -> NSImage { let bin = URL(fileURLWithPath: "/usr/local/bin/mere.run") let out = URL(fileURLWithPath: NSTemporaryDirectory()) .appendingPathComponent("spark-\(UUID().uuidString.prefix(8)).png") let proc = Process() proc.executableURL = bin proc.arguments = [ "image", "generate", "-m", "image-klein-nano", "-p", prompt, "-W", "1024", "-H", "768", "-s", "8", "-o", out.path, ] try proc.run() proc.waitUntilExit() guard proc.terminationStatus == 0, let img = NSImage(contentsOf: out) else { throw MereError.generationFailed } return img } }
Near-zero operating cost. No cloud. No subscription fatigue.
No server. No upload. No telemetry. Your prompts are yours.
MIT licensed. Built in Prince Edward Island. For the people who keep the world running.
From a single creative seed — a 1950s American roadside diner, just before midnight, in heavy rain — Mere generated text, a photograph, a vision analysis, narration, a full-production song with lyrics, two cinematic videos, working Swift code, and embeddings. All locally on a MacBook. In about four minutes.
"The door exhales a draft of ozone and wet asphalt, yielding to a sanctuary of humming neon and scorched lard. Inside, the air is a thick, amber suspension of tobacco smoke and percolating coffee…"
$ mere.run text chat -p "describe a 1950s diner…"
Native LTX pipeline on Metal. Text-to-video from the same scene description.
Animated directly from the generated photograph — steam, headlights, jukebox glow.
Full vocal production. Vintage rockabilly, brushed snare, twangy reverb-soaked Telecaster, doo-wop backing vocals, a soft tenor sax solo. Written and arranged from a single prompt + lyrics file.
Synthesized in a deep documentary voice, then transcribed with timestamps. Round-trip never leaves the machine.
/// Computes the Mandelbrot set for a given grid of complex points. func generateMandelbrotSet( width: Int, height: Int, bounds: ComplexPlaneBounds = .default, maxIterations: Int = 100 ) -> [[Int]] { var mandelbrot: [[Int]] = .init(repeating: .init(repeating: 0, count: width), count: height) let xStep = (bounds.right - bounds.left) / Double(width - 1) let yStep = (bounds.bottom - bounds.top) / Double(height - 1) for y in 0..<height { for x in 0..<width { let cx = bounds.left + Double(x) * xStep let cy = bounds.top + Double(y) * yStep mandelbrot[y][x] = iterate(cx: cx, cy: cy, max: maxIterations) } } return mandelbrot }
The same image fed back into Mere's vision stack: open-vocabulary object grounding with pixel-perfect masks.
$ mere.run vision ground hero.png --query "waitress"
Three sentences embedded locally. The two diner-related sentences cluster together; the cat sits alone.