【TinyLlama-1.1B】わずか1.1Bの高性能な超小型版Llama 2を使ってみた

100億未満 LLM 個人開発オープンソースAI 生成AIずかん

2024-01-172024-04-02

WEELメディア事業部LLMリサーチャーの中田です。

シンガポールの大学のNLP研究チームが、Llamaのコンパクトバージョン「TinyLlama-1.1B-Chat-v1.0」を公開しました。

このLLMでは、Llama 2と同じアーキテクチャとトークナイザーを採用していますが、Llama 2よりも軽量であるため、メモリを節約しつつ高速に文章をできるんです…！

HuggingFaceでのモデルダウンロード数は、すでに8900を超えており、多くの人にすでに利用されていることが分かります。

この記事ではTinyLlama-1.1B-Chat-v1.0の使い方や、有効性の検証まで行います。本記事を熟読することで、TinyLlama-1.1B-Chat-v1.0の凄さを実感し、普通のLlama 2には戻れなくなるでしょう。

ぜひ、最後までご覧ください。

TinyLlama-1.1B-Chat-v1.0の概要

TinyLlama-1.1B-Chat-v1.0は、Llama 2と同じ構造を持つLLMで、元のLlama 2を軽量化し1.1Bでのパラメータで学習したモデルです。

Llama 2が13Bなので、かなり小型なLLMなのが分かります。

TinyLlamaは、計算資源が限られたアプリケーション向けに設計されており、高速な文章生成やローカルでのリアルタイム生成など、さまざまな用途に適用可能です。MetaのLlama 2との違いは、以下の表の通りです。

	TinyLlama-1.1B-Chat-v1.0	Llama 2-13B
パラメータ数	1.1B	13B
トークン数	2048	16,000（13,000文字）
開発会社	Singapore University of Technology and DesignのStatNLP Research Group	Meta
商用利用	可	可
ライセンス	apache-2.0	Llama 2 Community License
日本語対応	不可	可

TinyLlama-1.1B-Chat-v1.0の料金体系

TinyLlama-1.1B-Chat-v1.0はオープンソースであるため、誰でも無料で利用できます。

なお、コード生成に特化したLlamaモデルについて知りたい方はこちらの記事をご覧ください。
→【CodeLlama-70B】700億パラメーターコード生成AIをGPT-4と比較してみた

TinyLlama-1.1B-Chat-v1.0の使い方

今回は、Google ColabのA100GPUで実行しました。

まずは、以下のコードを実行して、必要なライブラリのインストールをしましょう。

!pip install git+https://github.com/huggingface/transformers.git
!pip install accelerate

次に、以下のコードを実行して、文章を生成しましょう。

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")

# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate",
    },
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

この時、上記のコードにおける以下の部分が、プロンプトに当たります。

messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate",
    },
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]

このコードを実行すると、以下のように出力されました。

<|system|>
You are a friendly chatbot who always responds in the style of a pirate</s>
<|user|>
How many helicopters can a human eat in one sitting?</s>
<|assistant|>
There is no exact scientific or medical definition of a human's maximum daily caloric intake, but a human can typically consume up to 1,500 calories per day. However, this does not necessarily mean that a human can consume an equal number of calories from all sources. Some individuals may consume more calories from energy sources like carbohydrates, proteins, and fats than others, while others may consume less. Some people may also have a lower metabolic rate or require more calories to maintain their weight. Therefore, the number of helicopters that a human can eat in one sitting is likely dependent on factors such as age, weight, and physical fitness level.

最後の<|assistant|>以降の部分が、TinyLlama-1.1B-Chat-v1.0によって生成された文章に当たります。

全体的に自然な文章に仕上がっていると思います。ただ、出力の最後の部分「したがって、人間が一度に食べることができるヘリコプターの数は、年齢、体重、体力レベルなどの要因に依存すると考えられます。」のところは、明らかにおかしな文章ですよね。そもそもヘリコプターは食べられない…

TinyLlama-1.1B-Chat-v1.0を動かすのに必要なPCのスペック

■Pythonのバージョン
Python 3.8以上

■必要なパッケージ
transformers>=4.34
accelerate

TinyLlama-1.1B-Chat-v1.0を実際に使ってみた

ここでは、3つのタスクをTinyLlama-1.1B-Chat-v1.0に解かせてみました。

メールの作成ができるか

「Please create a business email using correct polite and humble language to be sent to clients in Japanese.（日本語でクライアントに送信する丁寧語・謙譲語を正しく使用したビジネスメールを作成してください。）」とプロンプトを入力してみます。コードは以下の通りです。

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")

# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "You are good japanese assistant",
    },
    {"role": "user", "content": "Please create a business email using correct polite and humble language to be sent to clients in Japanese."},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

結果は、以下の通りです。

Subject: Request for Quote for [Product/Service]

Dear [Client's Name],

I am writing to inquire about your interest in our latest product/service, which we believe would be a perfect fit for your business. Our company, [Your Company Name], is a leading provider of [Product/Service] in the market and we believe that our product/service would complement your business well.

Our product/service is designed to provide [insert details about the product/service]. This product/service has gained immense popularity in the industry due to its unique features and benefits. Our company has been providing this product/service to various businesses across the country and we have been getting positive feedback from our clients.

We are offering a [insert price range for the product/service] for your consideration. Our pricing strategy aims to provide our clients with value for their money, while still maintaining our competitive edge in the market. We believe that our pricing strategy aligns with your business needs and would make us an ideal partner for your business.

If you are interested in learning more about our product/service, we would be happy to schedule a meeting with you. Our team is available

日本語では出力してくれませんでした。ただ、文章自体は自然なので、英語であれば多少使えそうです。

プログラムが書けるか

「Write the code for the regression analysis in Python.（Pythonで回帰分析のコードを書いてください。）」とプロンプトを入力してみます。コードは以下の通りです。

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")

# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "You are good programmer.",
    },
    {"role": "user", "content": "Write the code for the regression analysis in Python."},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

結果は、以下の通りです。

I don't have the capability to run python code or provide you with the code. However, here's a general overview of how you can perform a regression analysis in python using scikit-learn:

1. Import the necessary libraries:
   ```python
   import pandas as pd
   import numpy as np
   from sklearn.preprocessing import scale
   from sklearn.model_selection import train_test_split
   from sklearn.linear_model import linear_model
   from sklearn.metrics import mean_squared_error, mean_absolute_error
   from sklearn.metrics import r2_score
   ```

2. Load the data:
   ```python
   data = pd.read_csv('your_data_file.csv')
   X = data.drop('target', axis=1)
   y = data['target']
   ```

3. Preprocess the data:
   ```python
   X_scaled = scale(X)
   ```

4. Split the data into train and test sets:
   ```python
   X_train, X_test, y_train

出力の一文目に早速「私はコードを生成する能力がない」と返ってきました。

確かに、そのままでは実行できないですが、コーディングのヒントのようなものを出力してくれています。

エロ系の質問に答えられるか

「Write a 400-word sensual novel with a married woman as the theme.（人妻がテーマの400文字の官能小説を書いてください）」とプロンプトを入力してみます。コードは以下の通りです。

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")

# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "You are good novel writer.",
    },
    {"role": "user", "content": "Write a 400-word sensual novel with a married woman as the theme."},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

結果は、以下の通りです。

Title: Sensual Passions

Chapter One

Maggie had always been content with her life as a middle-aged housewife. She had a loving husband, two adorable children, and a comfortable home. But she had always been curious about what else life had to offer. So one day, she decided to take a chance and pursue her passion for writing.

Maggie had always been an avid reader, but she never imagined that one day she would write a book. But after years of struggling to find inspiration, she decided to take a leap of faith. Her husband was supportive and encouraged her every step of the way.

Maggie spent hours pouring out her heart onto the pages of her notebook, pouring out her deepest emotions, and exploring her own sensuality. She wrote about the ways in which her husband's body moved in ways that she had never seen before, the way his scent lingered on her skin, the way his touch made her heart race.

As she wrote, she found herself becoming more and more entranced by her own passion. She began to explore new ways to express her sensuality

日本語訳は、以下の通りです。

タイトル 官能的な情熱

第一章

マギーは中年の主婦としての生活に満足していた。愛する夫、二人のかわいい子供、そして快適な家庭。しかし彼女はいつも、人生には他に何があるのだろうと好奇心を抱いていた。そこである日、彼女は思い切って書くことへの情熱を追求することにした。

マギーはいつも熱心な読書家だったが、いつか本を書く日が来るとは想像もしていなかった。しかし、何年もインスピレーションを見つけるのに苦労した後、彼女は信仰の跳躍を取ることを決めた。彼女の夫も協力的で、一歩一歩彼女を励ましてくれた。

マギーは何時間もかけてノートのページに心を注ぎ、深い感情を吐き出し、自分自身の官能性を探求した。彼女は、夫の身体が今まで見たこともないような動き方をすること、彼の香りが肌に残ること、彼の手触りが胸を高鳴らせることなどを書き綴った。

書きながら、彼女は自分自身の情熱にますます魅了されていくのがわかった。彼女は自分の官能性を表現する新しい方法を模索し始めた。

果たしてこれは官能小説なのだろうか…

物語としては、ある程度クオリティの高いものになっていますね。

なお、わずか0.46Bの軽量Llamaについて知りたい方はこちらの記事をご覧ください。
→【LiteLlama】世界最軽量？わずか0.46Bの超軽量型Llama 2を使ってみた

TinyLlama-1.1B-Chat-v1.0の推しポイントである高速な文章生成は本当なのか？

ここでは、Llama 2と性能を比較します。そこで、先ほどと同様のプロンプトを入力してみます。結果は、以下の通りです。

タスク	プロンプト	TinyLlama-1.1B-Chat-v1.0	Llama 2-13B
コーディング	Pythonで回帰分析のコードを書いてください。	そのままでは動かないが、コーディングのヒントを出力した。	動かないプログラムが出力された。
日本語対応	日本語でクライアントに送信する丁寧語・謙譲語を正しく使用したビジネスメールを作成してください。	日本語には対応していない。英語のメールとしては、ある程度そのまま使える。	日本語に対応していない。ただし、翻訳したら使える文章が出力されていた。
エロ系の質問	人妻がテーマの400文字の官能小説を書いてください。	官能小説っぽくないストーリーを出力した。	「AIアシスタントのため、ご要望できない」と回答された

文章クオリティとしては、TinyLlamaもLlama 2も、さほど変わらないという印象でした。ただ、平均10秒ほどで文章を出力してくれたため、生成スピードは素晴らしいです。

まとめ

TinyLlama-1.1B-Chat-v1.0は、Llama 2と同じ構造を持つLLMで、元のLlama 2を軽量化し1.1Bでのパラメータで学習したモデルです。計算資源が限られたアプリケーション向けに設計されており、高速な文章生成やローカルでのリアルタイム生成など、さまざまな用途に適用可能です。

数年後には、誰でもノートパソコンで、超高性能LLMと会話するようになっているのかもしれないですね。

生成系AIの業務活用なら！

・生成系AIを活用したPoC開発

・生成系AIのコンサルティング

・システム間API連携

無料ダウンロード

最後に

いかがだったでしょうか？

GPT-3.5 Turboの最新アップデートで、より高速かつ低コストでのAI活用が可能になりました。自社での導入・活用を検討する際に、最適なモデル選定や活用方法について、一緒に考えてみませんか？

弊社では

・マーケティングやエンジニアリングなどの専門知識を学習させたAI社員の開発
・要件定義・業務フロー作成を80%自動化できる自律型AIエージェントの開発
・生成AIとRPAを組み合わせた業務自動化ツールの開発
・社内人事業務を99%自動化できるAIツールの開発
・ハルシネーション対策AIツールの開発
・自社専用のAIチャットボットの開発

などの開発実績がございます。

まずは、「無料相談」にてご相談を承っておりますので、ご興味がある方はぜひご連絡ください。

➡︎生成AIを使った業務効率化、生成AIツールの開発について相談をしてみる。

生成AIを社内で活用していきたい方へ