Daily - April 29, 2024 - Aschen

Delete Set public Set private

Daily Shaarli

Previous day

All links of one day in a single page.

Next day

April 29, 2024

vllm: A high-throughput and memory-efficient inference and serving engine for LLMs

vLLM est un serveur d'inférence pour LLM.

Avec (Text Generation Inference](https://links.aschen.tech/shaare/Go1xSQ) (TGI) de Hugging Face, c'est une des référence pour monter sa propre infrastructure capable de servir des LLMs.

c-ai text-ai inference