Running quantized LLMs locally | Boston

Members-Only

Recent Talks & Demos are for members only

Exclusive feed

You must be an AI Tinkerers active member to view these talks and demos.

Sign in View FAQ

January 22, 2024 · Boston

llama.cpp: Local Quantized LLMs

Overview

Running smaller quantized open source models on your own computer is getting popular, I thought I would demo how I do that with llama.cpp

Links

https://github.com/ggerganov/llama.cpp
C/C++ LLM inference using ggml, supporting GGUF quantization and diverse backends.

Tech stack