Create anything. Locally.

Images. Video. 3D. Audio. Chat. Just enough. Nothing more.

Download — Free View on GitHub Machines · Enriching · Regular · Experiences

One toolkit. Your Mac.

Images, video, music, voice, chat, code, OCR, training — everything runs locally on Apple Silicon.

CREATE
Images

Two model families. LoRA training. Image-to-image. Custom styles.

mere.run image "a portrait in oil paint style"
CREATE
Video

Text-to-video and image-to-video. Native LTX pipeline on Metal.

mere.run video "timelapse of clouds over peaks"
CREATE
Music

Generate tracks, remix, cover vocals. ACE-Step with lyrics and BPM.

mere.run music --lyrics "verse one..." --bpm 120
INTERACT
Chat

122B-parameter LLM running locally. Code generation. Multi-turn.

mere.run chat "explain how diffusion models work"
INTERACT
Voice

Text-to-speech with voice cloning. Save and reuse voice profiles.

mere.run talk "Hello world" --clone voice.wav
INTERACT
Listen

Transcribe and translate audio. Streaming ASR. Timestamps.

mere.run listen recording.wav --timestamps
TRAIN
LoRA Training

Fine-tune on your own images. Style or subject. Resume from checkpoint.

mere.run image train-lora ./my-photos --trigger "sks"
UNDERSTAND
Vision & OCR

Describe images. Extract text. Caption datasets for training.

mere.run look photo.jpg "what is this?"
SERVE
API Server

OpenAI-compatible endpoint. Drop into Cursor, VS Code, any client.

mere.run serve --engine q35 --port 8080
mere.

Every capability at your fingertips. One command away.

~/projects · zsh
$ mere.run image "an astronaut painting on the moon, oil on canvas" Zeta Max · 1024x1024 · 4 steps · 9.2s → astronaut_moon.png
$ mere.run video "timelapse of a flower blooming in morning light" LTX-2 · 768x512 · 97 frames · 42s → flower_bloom.mp4
$ mere.run talk "Welcome to Mere" --clone my-voice.wav TTS · cloned voice · 2.1s → welcome.wav
$ mere.run chat "what's the best way to fine-tune a diffusion model?" Q35 · 122B-A10B · streaming response...
$ mere.run music --lyrics "[verse] walking through the rain..." --bpm 90 ACE-Step · 30s duration · turbo decode → rain_walk.wav
$ mere.run serve --engine q35 --port 8080 OpenAI-compatible API running at http://localhost:8080
AGENT NATIVE

Built for humans.
Built for agents.

Mere is a first-class tool for AI agents. Claude Code, Cursor, and any OpenAI-compatible client can invoke every capability — image generation, video, music, TTS, chat — through a single CLI or API endpoint. Your Mac becomes an AI compute layer that both you and your agents share.

  • /mere-run skill for Claude Code — agents generate assets mid-task
  • mere.run serve — OpenAI-compatible API for any agent framework
  • CLI-first — every command is scriptable and composable
claude code · session
CLAUDE CODE
> Generate a hero image for the landing page
Using /mere-run to generate image...
$ mere.run image "minimal product hero, soft gradient,
clean composition, studio lighting" -W 2048
Zeta Max · 2048x1024 · 4 steps · 11.4s → hero_landing.png saved
Image generated. Adding to page layout.
Mac Native · Swift

A Swift binary.
SwiftUI ready.

Mere is a single Swift CLI. Every Mac app — yours or one your agent vibe-codes mid-conversation — can drop it in via Process. Spark, below, is a real SwiftUI app. Claude wrote it in one prompt.

Spark — idle
Spark.app idle state — empty canvas with a prompt field. empty state · resolves mere.run on $PATH
Spark — generated · 9.2s
Spark.app showing a generated cozy reading nook image. image-klein-nano · 1024×768 · on this Mac
MereClient.swift ~24 lines
import Foundation
import AppKit

enum MereClient {
    static func generate(prompt: String) async throws -> NSImage {
        let bin = URL(fileURLWithPath: "/usr/local/bin/mere.run")
        let out = URL(fileURLWithPath: NSTemporaryDirectory())
            .appendingPathComponent("spark-\(UUID().uuidString.prefix(8)).png")

        let proc = Process()
        proc.executableURL = bin
        proc.arguments = [
            "image", "generate",
            "-m", "image-klein-nano",
            "-p", prompt,
            "-W", "1024", "-H", "768",
            "-s", "8",
            "-o", out.path,
        ]
        try proc.run()
        proc.waitUntilExit()

        guard proc.terminationStatus == 0,
              let img = NSImage(contentsOf: out)
        else { throw MereError.generationFailed }
        return img
    }
}
$ swift build -c release && open Spark.app no SDKs · no dependencies · no API keys
Source: demo-app/ in this repo · vibe-coded by Claude in a single prompt · runs against the local mere.run CLI

Local-first.

Near-zero operating cost. No cloud. No subscription fatigue.

Nothing leaves your Mac.

No server. No upload. No telemetry. Your prompts are yours.

Free. Forever.

MIT licensed. Built in Prince Edward Island. For the people who keep the world running.

The Showcase

One scene.
Every modality.

From a single creative seed — a 1950s American roadside diner, just before midnight, in heavy rain — Mere generated text, a photograph, a vision analysis, narration, a full-production song with lyrics, two cinematic videos, working Swift code, and embeddings. All locally on a MacBook. In about four minutes.

$ bash ~/mere/run/demo.sh 9 modalities · 13 artifacts · 0 cloud calls · ~4 min on M-series
Image · klein-nano Generated 1950s diner scene with vision-grounded bounding boxes overlaid.
waitress · 0.95 jukebox · 0.92 neon sign ×3
Text · Creative gemma4 · 0.85t
"The door exhales a draft of ozone and wet asphalt, yielding to a sanctuary of humming neon and scorched lard. Inside, the air is a thick, amber suspension of tobacco smoke and percolating coffee…"
— mere.run text chat · 512 tokens $ mere.run text chat -p "describe a 1950s diner…"
Video · LTX 768×512 · 65f · 24fps

Establishing shot, dolly-in

Native LTX pipeline on Metal. Text-to-video from the same scene description.

Video · Image-to-Video 33f · seeded by hero

The still comes to life

Animated directly from the generated photograph — steam, headlights, jukebox glow.

Music · ACE-Step G major · 88 BPM · 60s

"Honey, stay one more song with me"

Full vocal production. Vintage rockabilly, brushed snare, twangy reverb-soaked Telecaster, doo-wop backing vocals, a soft tenor sax solo. Written and arranged from a single prompt + lyrics file.

Honey stay one more song with me
Underneath the chrome and the canopy
Red vinyl shining in the smoky light
Save me from the lonely night
Speech · TTS → ASR qwen3-nano · parakeet

The text, spoken back.

Synthesized in a deep documentary voice, then transcribed with timestamps. Round-trip never leaves the machine.

[00:00 → 00:08] The door exhales a draught of ozone…
[00:08 → 00:18] Inside, the air is a thick, amber suspension…
[00:18 → 00:27] Outside, the rain hammers the plate glass…
Code · Swift qwen3-coder · streamed
mandelbrot.swift
/// Computes the Mandelbrot set for a given grid of complex points.
func generateMandelbrotSet(
    width: Int,
    height: Int,
    bounds: ComplexPlaneBounds = .default,
    maxIterations: Int = 100
) -> [[Int]] {
    var mandelbrot: [[Int]] = .init(repeating: .init(repeating: 0, count: width), count: height)
    let xStep = (bounds.right - bounds.left) / Double(width - 1)
    let yStep = (bounds.bottom - bounds.top) / Double(height - 1)
    for y in 0..<height {
        for x in 0..<width {
            let cx = bounds.left + Double(x) * xStep
            let cy = bounds.top + Double(y) * yStep
            mandelbrot[y][x] = iterate(cx: cx, cy: cy, max: maxIterations)
        }
    }
    return mandelbrot
}
Vision · Falcon Perception grounding + masks

Knows what it sees.

The same image fed back into Mere's vision stack: open-vocabulary object grounding with pixel-perfect masks.

waitressbox (0.61, 0.28) → (0.74, 0.68) jukeboxbox (0.84, 0.28) → (0.96, 0.65) neon sign3 detections · masks saved
$ mere.run vision ground hero.png --query "waitress"
Embeddings · qwen3-0.6b cosine similarity

Same Mac. Semantic search ready.

Three sentences embedded locally. The two diner-related sentences cluster together; the cat sits alone.

"rainy 1950s American diner late at night"
↔ "neon-lit chrome counter and red vinyl booths"
cosine0.510
"neon-lit chrome counter and red vinyl booths"
↔ "a cat sitting on a windowsill"
cosine0.351
"rainy 1950s American diner late at night"
↔ "a cat sitting on a windowsill"
cosine0.293
Every artifact above was generated by ~/mere/run/demo.sh · no cloud · no edits · the script is in the repo.