OpenAvatarChat

https://github.com/HumanAIGC-Engineering/OpenAvatarChat

automatic jump after -s...

Website Introduction

Project Introduction

OpenAvatarChat is an open source project for modular interactive digital human chat developed by HumanAIGC-Engineering. Its core is to realize the real-time chat capability of digital humans on a single PC, support multimodal interaction and flexible component replacement. The latest version is 0.5.0, which has realized the front-end and back-end separation architecture.

Core Highlights

Low-latency real-time chat: The average response delay is about 2.2 seconds, including RTC transmission, VAD detection and full-process calculation time;
Multimodal language model: Support multimodal input such as text, audio and video, and adapt to multimodal large models such as MiniCPM-o;
Modular design: Decoupling of each functional module, flexible replacement of components such as ASR, LLM, TTS and digital human rendering;
Multi-digital human adaptation: Support various digital human types such as LiteAvatar (2D), LAM (photo-realistic 3D) and MuseTalk (2D);
Lightweight deployment: Support cloud API to replace local large models, greatly reduce hardware configuration requirements, and also support CPU/GPU inference.

Core Capabilities

Support the full process link of ASR+LLM+TTS, and adapt to two deployment methods: local inference / cloud API;
6 preset configuration schemes, covering the needs of different hardware environments and digital human types;
Integrate mainstream speech processing tools such as SenseVoice, CosyVoice and Edge TTS;
Support Dify Chatflow call to expand the customization capability of chat process;
Provide two deployment methods: Docker containerized deployment and local environment deployment, adapting to Windows and Linux systems.

System Requirements

Python version: ≥3.11.7, <3.12;
Hardware requirements: CUDA-supported GPU (unquantized MiniCPM-o requires more than 20GB video memory, int4 quantized version requires less than 10GB video memory), digital human inference supports CPU/GPU;
Dependency environment: CUDA≥12.4, it is recommended to use uv for Python package management.

Online Experience

The project has deployed online experience services on ModelScope and HuggingFace, which can switch the capabilities of LiteAvatar and LAM digital humans. The audio link is realized based on SenseVoice + Qwen-VL + CosyVoice.

Comment area