Llama is a family of state-of-the-art, open-source Large Language Models (LLMs) developed by Meta, featuring multimodal capabilities, high efficiency, and scalable performance. Ranging from 1B to 405B+ parameters, these models, including Llama 3.1 and 3.2, excel in text generation, coding, image understanding, and long-context reasoning, allowing for local deployment.

llama.cpp is an open source software library that performs inference on various large language models such as Llama.[3] It is co-developed alongside the GGML project, a general-purpose tensor library.[4]

multi-modal on-device inference