Stability AI, makers of the image generation AI Stable Diffusion, recently launched Stable Chat, a web-based chat interface for their open-access language model Stable Beluga. At the time of its release, Stable Beluga was the best-performing open large language model (LLM) on the HuggingFace leaderboard.
Stable Beluga is based on the LLaMA foundation model released by Meta. The model is fine-tuned using a synthetic dataset generated by GPT-4. The largest Stable Beluga model contains 70B parameters and outperforms ChatGPT on several benchmarks, including AGIEval, which is based on several common examinations such as LSAT and SAT. To help evaluate Stable Beluga, Stability AI created the Stable Chat web interface to help users interact with the model and give feedback on its output. According the Stability AI,
As part of our efforts at Stability AI to build the world’s most trusted language models, we’ve set up a research-purpose-only website to test and improve our technology. We will continue to update new models as our research progresses rapidly. We ask that you please avoid using this site for real-world applications or commercial uses.
The Stable Beluga models were inspired by a paper published by Microsoft on Orca, a fine-tuned version of LLaMA. In the paper, Microsoft described a technique called explanation tuning. Like instruction tuning, which has been used on many open LLMs recently, including ChatGPT and Vicuna, explanation tuning uses a dataset of example inputs and desired model outputs that are generated by a teacher. In the case of ChatGPT, the teachers are actual human users of the model. In contrast, for Orca and Stable Beluga, the explanation tuning dataset is generated by prompting GPT-4 to explain why it generated the output it did (“explain like I’m five.”)
Stability AI created their own explanation tuning dataset of 600,000 examples—one-tenth the size of the Microsoft dataset. They then trained two versions of Stable Beluga: Stable Beluga 1, based on the 65B parameter original LLaMA model, and Stable Beluga 2, based on the 70B Llama 2 model. Both are released under a non-commercial license. Although the models achieved fourth and first place, respectively, on the leaderboard when they were released, the proliferation of LLaMA-based fine-tuned models has currently pushed Stable Beluga 2 out of the top ten, and Stable Beluga 1 even lower.
The models were released under a non-commercial license to encourage researchers to help iterate and improve on the technology, according to Stability AI. However, the company noted that this required resources that are “beyond the reach of everyday researchers,” and decided to create the Stable Chat website. Users can create a free login or use a Google account to access the chat. The responses from the model can be up-voted, down-voted, or flagged; this user feedback will be used to help improve the model in the future.
Stability AI founder Emad Mostaque posted about the release on Twitter/X. One user replied that the model was “too cautious in giving factual information.” Mostaque urged the user to give that feedback via the web interface.
Stability AI also recently announced that their LLMs will be used at an AI red-teaming event at DEF CON 31. This event is sponsored by the White House and features models from “Anthropic, Google, Hugging Face, Microsoft, NVIDIA, OpenAI, and Stability AI.” The goal is to help identify risks and vulnerabilities in the models.