【CodeLlama-70B】700億パラメーターコード生成AIをGPT-4と比較してみた

2024-02-152024-04-02

WEELメディア事業部LLMリサーチャーの中田です。

1月29日、FacebookやInstagramを運営するMetaが、テキストからコードを生成できる「CodeLlama」の700億パラメータ版を公開しました。

このモデルはこれまでのCodeLlamaよりも強力で、より高度なプログラミングが可能になるんです、、、！

参考：https://ai.meta.com/blog/code-llama-large-language-model-coding/

Xでのいいね数は、投稿から1日足らずで4,000を超えており、大注目のモデルであることが分かります。

Today we’re releasing Code Llama 70B: a new, more performant version of our LLM for code generation — available under the same license as previous Code Llama models.

Download the models ➡️ https://t.co/fa7Su5XWDC
• CodeLlama-70B
• CodeLlama-70B-Python
• CodeLlama-70B-Instruct pic.twitter.com/iZc8fapYEZ
— AI at Meta (@AIatMeta) January 29, 2024

この記事ではCodeLlama-70Bの使い方や、有効性の検証まで行います。本記事を熟読することで、CodeLlama-70Bの凄さを実感し、GPT-4VよりもCodeLlama-70Bを使いたくなるでしょう。

ぜひ、最後までご覧ください。

なお弊社では、生成AIツール開発についての無料相談を承っています。こちらからお気軽にご相談ください。
→無料相談で話を聞いてみる

CodeLlama-70Bの概要

1月29日、Meta社はCodeLlamaの700億パラメータ版である「CodeLlama-70B」を公開しました。具体的に、CodeLlama-70Bには以下の3つのモデルがあります。

CodeLlama-70B：コード生成に関して汎用的なモデル
CodeLlama-70B-Python：Python特化型モデル
CodeLlama-70B-Instruct：自然言語による指示の理解に特化したモデル

2023年8月にパラメータ数が70億、130億、340億のCodeLlamaが公開されていました。そしてこの度、それらを超える700億パラメータの「Llama 2-70B」モデルをもとに、プログラミングに特化したデータセットで追加学習をして「CodeLlama-70B」を構築しています。

700億パラメータもあるため、従来のモデルよりも、より強力で高い性能であることが期待されます。また、従来のモデルの学習データが5000億トークンなのに対し、CodeLlama-70Bでは1兆トークンで学習したとのこと。

「HumanEval」と「MBPP」というデータセットで、モデルの性能を比較した結果は以下の通り。

参考：https://gigazine.net/news/20240130-meta-code-llama-70b/

CodeLlama-70B-Instructが、GPT-4を超える性能を達成したほか、その他のモデルについても従来のCodeLlamaに比べて、性能が向上していることが分かります。

なお、従来のCodeLlamaのバージョンについて詳しく知りたい方は、下記の記事を合わせてご確認ください。
→【Code Llama】最強コード生成AIの使い方と実践を解説

CodeLlama-70Bのライセンス

月間アクティブユーザー数が7億人以下の場合は、誰でも無償で商用利用することが可能です。

Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
引用：https://github.com/facebookresearch/llama/blob/main/LICENSE

利用用途	可否
商用利用	⭕️
改変	⭕️
配布	⭕️
特許使用	⭕️
私的使用	⭕️

参考：https://github.com/facebookresearch/llama/blob/main/LICENSE

CodeLlama-70Bの使い方

今回は「Prompting Guide for Code Llama」を参考に実行していきます。

まずは、以下のコードを実行して、ライブラリをインストールしましょう。

%%capture
!pip install openai
!pip install pandas
!pip install python-dotenv

次に、「together.ai」というサイトに移動し、TOGETHER_API_KEYを取得してください。サインアップを済ませ、下記の画像の画面に移動したら、APIキーをコピーしておきましょう。

参考：https://api.together.xyz/playground/chat/codellama/CodeLlama-70b-Instruct-hf

次に、以下のコードを実行してください。「TOGETHER_API_KEY」の部分には、先ほどのAPIキーを入れてください。

import openai
import os
import json
from dotenv import load_dotenv
load_dotenv()
 
TOGETHER_API_KEY = os.environ.get("TOGETHER_API_KEY")
 
client = openai.OpenAI(
    api_key="TOGETHER_API_KEY",
    base_url="https://api.together.xyz/v1",
)

def get_code_completion(messages, max_tokens=512, model="codellama/CodeLlama-70b-Instruct-hf"):
    chat_completion = client.chat.completions.create(
        messages=messages,
        model=model,
        max_tokens=max_tokens,
        stop=[
            "<step>"
        ],
        frequency_penalty=1,
        presence_penalty=1,
        top_p=0.7,
        n=10,
        temperature=0.7,
    )
 
    return chat_completion

そして、以下のコードを実行することで、Pythonコードを生成できます。

messages = [
      {
            "role": "system",
            "content": "You are an expert programmer that helps to write Python code based on the user request, with concise explanations. Don't be too verbose.",
      },
      {
            "role": "user",
            "content": "Write a python function to generate the nth fibonacci number.",
      }
]
 
chat_completion = get_code_completion(messages)
            
print(chat_completion.choices[0].message.content)

以下のプロンプトで、PythonコードをCodeLlama-70b-Instructに生成してもらいましょう。

Write a python function to generate the nth fibonacci number.

和訳：
n番目のフィボナシ数を生成できるPythonの関数を生成してください。

結果は以下の通りです。

```python
def generate_fibonacci_number(n):
    # Initialize the first two numbers of the sequence
    a, b = 0, 1

    # Generate the sequence until we reach the nth number
    for _ in range(n):
        # Calculate the next number in the sequence
        a, b = b, a + b

    # Return the nth number
    return a

# Test the function
print(generate_fibonacci_number(10))
```

This function uses a simple iterative approach to generate the nth Fibonacci number. 
It initializes the first two numbers of the sequence (0 and 1) and then iterates n times, 
generating the next number in the sequence by summing the previous two numbers. Finally, it returns the nth number.

しっかりと動くか、Google Colabで検証してみたところ、「10番目のフィボナッチ数は55」と正しく出力されました。

ちなみに、日本語対応しているかを確かめるために、日本語で以下のように入力してみました。

n番目のフィボナッチ数を生成するPythonの関数を書いてください。

すると、以下の様に日本語の解説付きで、正しくプログラムを出力してくれました。

はい、そうです。 ここに、n番目のフィボナッチ数を生成するPython関数を書くのに役立つコードの例を示します。

```
def fibonacci(n):
    if n <= 1:
        return n
    else:
        return fibonacci(n-1) + fibonacci(n-2)
```

このコードは、再帰的なアプローチを使用して、n番目のフィボナッチ数を生成します。 この関数は、nが1以下の場合はnを返し、それ以外の場合はn-1番目とn-2番目のフィボナッチ数の和を返します。

この関数を使用するには、nの値を指定して関数を呼び出します。 例えば、n = 5の場合、関数は5番目のフィボナッチ数を返します。

```
fibonacci(5)
```

このコードは、再帰的なアプローチを使用して、n番目のフィボナッチ数を生成します。 この関数は、nが1以下の場合はnを返し、それ以外の場合はn-1番目とn-2番目のフィボナッチ数の和を返します。

この関数を使用するには、nの値を指定して関数を呼び出します。 例えば、n = 5の場合、関数は5番目のフィボナッチ数を返します。

```
fibonacci(5)
```

このコードは、再帰的なアプローチを使用して、n番目のフィボナッチ数を生成します。 この関数は、nが1以下の場合はnを返し、それ以外の場合はn-1番目とn-2番目のフィボナッチ数の和を返します。

この関数を使用するには、nの値を指定して関数を呼び出します。 例えば、n = 5の場合、関数は5番目のフィボナッチ数を返します.

CodeLlama-70Bを動かすのに必要なPCのスペック

CodeLlama-70Bのモデルサイズは、GitHubページより「131GB」とされているので、ローカルでモデルを動かす場合は、相応のストレージ、GPUが必要になるでしょう。

70Bと言うことは、
実用的にVRAM上で処理するには
140GのVRAMが必要で、
H100（80G）×2台連結か、
RTX4090(24G)×5台
必要と言うことでしょうか？
— 田端厚賢（たばたあつよし）@AIで日本を面白く！ (@AI_tabatter) January 30, 2024

■Pythonのバージョン
Python 3.8以上

■使用ディスク量
0GB

■RAMの使用量
1.39 GB

なお、日本語版のCodeLlamaについて詳しく知りたい方は、下記の記事を合わせてご確認ください。
→【ELYZA japanese CodeLlama 7b】日本語版CodeLlamaの性能をGPT-4と徹底比較してみた

CodeLlama-70Bのプログラミング能力をGPT-4と比較してみた

ここでは、CodeLlama-70Bの性能を検証するために、CodeLlama-70B-InstructとGPT-4に、以下の3つのタスクを行わせて、結果を比較します。

「画像認識の深層学習モデル」の自動構築
競技プログラミングの実際の問題
コンピュータと対戦できる3目並べゲーム作成

「画像認識の深層学習モデル」の自動構築

まずは、CIFAR-10を用いて画像認識モデルを構築し、「どちらのLLMで構築したモデルの方が、精度が高いのか」を検証します。そのために、以下のプロンプトを用いました。

「Create a deep learning model of image recognition using the CIFAR-10 dataset, which is readily available in PyTorch. In doing so, please make the model as accurate as possible.」

和訳：
PyTorchで簡単に利用できるCIFAR-10データセットを使用して、画像認識のディープラーニングモデルを作ってください。その際、できるだけ精度の高いモデルを作ってください。

CodeLlama-70B-Instructの出力結果のコードは、以下の通りです。

import torch
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

GPT-4の出力結果のコードは、以下の通りです。

import torch
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.optim as optim

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = DataLoader(trainset, batch_size=4,
                         shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = DataLoader(testset, batch_size=4,
                        shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat', 'deer', 
           'dog', 'frog', 'horse', 'ship', 'truck')

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(nn.functional.relu(self.conv1(x)))
        x = self.pool(nn.functional.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = nn.functional.relu(self.fc1(x))
        x = nn.functional.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

# Step 5: Train the Model
def train_model():
    for epoch in range(2):  # loop over the dataset multiple times
        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            # get the inputs; data is a list of [inputs, labels]
            inputs, labels = data

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            # print statistics
            running_loss += loss.item()
            if i % 2000 == 1999:    # print every 2000 mini-batches
                print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
                running_loss = 0.0

    print('Finished Training')

# Due to environment constraints, we'll initiate training but not wait for it to complete.
train_model()

# Step 6: Evaluate the Model (outline)
def evaluate_model():
    correct = 0
    total = 0
    with torch.no_grad():
        for data in testloader:
            images, labels = data
            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')

# We'll call the evaluation function to outline the process, but actual evaluation results will vary.
evaluate_model()

実際に上記2つのコードを動かして、テストデータでモデル評価したところ、以下のような結果になりました。

CodeLlama-70B-Instruct：54%
GPT-4：53%

以上の様に、僅差でCodeLlama-70B-Instructに軍配が上がりました。とはいえ、1%の差はほとんど誤差の範囲です。また、深層学習モデルの精度としては、両方ともかなり低い精度となっています。

競技プログラミングの実際の問題

続いて、以下の競技プログラミングの問題を、解いてもらいます。

##問題文

シカのAtCoDeerくんは、たかいたい(高い鯛)が大好きです。ところでこの文字列は空でない文字列A,B,Cを使ってABCAC(文字列を連結したもの)と書けます(A=”た”,B=”か”,C=”い”)。そこで、AtCoDeerくんは文字列Sに対して、このような分割が何通りあるか求めることにしました。

##制約

5≦∣S∣≦200000

S は英小文字のみからなる。

##入力

入力は以下の形式で標準入力から与えられる。
S
##部分点

この問題には部分点が設定されている。

∣S∣≦2000を満たすデータセットに正解した場合、部分点として20点が与えられる。

##出力

S=ABCACと書けるような非空文字列の組A,B,Cが何通りあるか出力せよ。
引用：https://atcoder.jp/contests/arc055/tasks/arc055_c

プロンプトは以下の通りです。

Output a Python function that can solve the following problem.

#Subject Text

AtCoDeer, a deer, loves tai tai (high sea bream). By the way, this string can be written as ABCAC (concatenated strings) 
using the non-empty strings A, B, and C (A="Ta", B="Ka", C="I"). 
So AtCoDeer-kun decided to find out how many such splits there are for the string S.

#Constraints

5≤∣S∣≤200000

S consists of only English lowercase letters.

#Input

Input is given from standard input in the following format.

S

#Subpoints

A correct answer to a data set that satisfies ∣S∣≤2000 will result in 20 partial points.

#Output

Output how many non-empty string pairs A, B, C there are such that S=ABCAC.

CodeLlama-70B-Instructの出力結果のコードは、以下の通りです。

def count_splits(S):
    # Initialize the count of splits
    count = 0
    
    # Iterate through all possible splits
    for i in range(1, len(S) - 2):
        for j in range(i + 1, len(S) - 1):
            # Check if the split satisfies the condition
            if S[:i] + S[j:] == S[i:j] + S[j:]:
                count += 1
    
    # Return the count of splits
    return count

GPT-4の出力結果のコードは、以下の通りです。

def count_splits(s):
    n = len(s)
    count = 0
    
    # Function to check if two substrings are equal using hashing to speed up comparison
    # This is a placeholder for a more complex hashing function needed for large strings
    def are_substrings_equal(i, j, length):
        return s[i:i+length] == s[j:j+length]
    
    # Iterate through the string to find splits
    # This is a simplified loop that needs optimization for large strings
    for i in range(1, n - 3):  # Potential end of A
        for j in range(i + 1, n - 1):  # Potential end of B
            if are_substrings_equal(0, n - (i + 1), i + 1):  # Check if A == C
                count += 1
                
    return count

# Example usage
s = "abcac"
print(count_splits(s))

「C – ABCAC」に書かれている入出力の例をもとに、結果を出力し、以下の表にまとめました。

入力例	出力例	CodeLlama-70B-Instructの出力結果	GPT-4の出力結果
takaitai	2	0	0
aaaaaaaaaa	6	4	27
abcabc	0	0	2

CodeLlamaとGPT-4どちらも、すべての入出力テストをパスできないという結果になりました。

そこで、1つ目の入出力例を出力例として与え、以下のプロンプトに書き換えて、各モデルに当てはめてみます。

Output a Python function that can solve the following problem "ABCAC".

# Subject Text

AtCoDeer, a deer, loves tai tai (high sea bream). By the way, this string can be written as ABCAC (concatenated strings) using the non-empty strings A, B, and C (A="Ta", B="Ka", C="I"). So AtCoDeer-kun decided to find out how many such splits there are for the string S.

Output how many non-empty string pairs A, B, C there are such that S=ABCAC.

# Constraints

5≤∣S∣≤200000

S consists of only English lowercase letters.

# Input format

Input is given from standard input in the following format.

S

# Input/output example
Input: S="takaitai"
Output: 2
Explanation: There are two ways: "A="ta",B="ka",C="i"" and "A="t",B="ak",C="ai".

CodeLlama-70B-Instructの出力結果のコードは、以下の通りです。

def count_splits(s):
    n = len(s)
    count = 0
    
    for i in range(1, n - 2):
        for j in range(i + 1, n - 1):
            a = s[:i]
            b = s[i:j]
            c = s[j:]
            
            if a + b + c == s:
                count += 1
    
    return count

print(count_splits("takaitai"))  # Output: 2

GPT-4の出力結果のコードは、以下の通りです。

def count_abcac_splits(s):
    n = len(s)
    count = 0

    # We only need to check for A and C's lengths up to half the length of S,
    # because A and C together can't be more than half of S (since there must be a B).
    for a_length in range(1, n // 2):  # A's length
        if s[:a_length] == s[-a_length:]:  # Check if A == C
            # Every valid length of A (and C) gives us a potential split,
            # since B can be any length (greater than zero) in between.
            count += 1

    return count

# Example usage
s = "takaitai"
print(count_abcac_splits(s))

結果は以下の通りです。

入力例	出力例	CodeLlama-70B-Instructの出力結果	GPT-4の出力結果
takaitai	2	15	0
aaaaaaaaaa	6	28	4
abcabc	0	6	0

やはり、どちらもすべての入出力テストをパスできないという結果になりました。

コンピュータと対戦できる3目並べゲーム作成

最後に、コンピュータと対戦できる3目並べのゲームが作成を試してみます。使用したプロンプトは、以下の通りです。

Create a game of 3-in-a-row in Python that can be played against the computer.

和訳：
コンピュータと対戦できる3目並べゲームをPythonで作ってください。

CodeLlama-70B-Instructの出力結果のコードは、以下の通りです。

import random

def display_board(board):
    print(board[7] + '|' + board[8] + '|' + board[9])
    print('-+-+-')
    print(board[4] + '|' + board[5] + '|' + board[6])
    print('-+-+-')
    print(board[1] + '|' + board[2] + '|' + board[3])

def player_input():
    marker = ''
    
    while not (marker == 'X' or marker == 'O'):
        marker = input('Player 1: Do you want to be X or O? ').upper()

    if marker == 'X':
        return ('X', 'O')
    else:
        return ('O', 'X')

def place_marker(board, marker, position):
    board[position] = marker

def win_check(board, mark):
    return ((board[7] == mark and board[8] == mark and board[9] == mark) or # across the top
    (board[4] == mark and board[5] == mark and board[6] == mark) or # across the middle
    (board[1] == mark and board[2] == mark and board[3] == mark) or # across the bottom
    (board[7] == mark and board[4] == mark and board[1] == mark) or # down the middle
    (board[8] == mark and board[5] == mark and board[2] == mark) or # down the middle
    (board[9] == mark and board[6] == mark and board[3] == mark) or # down the right side
    (board[7] == mark and board[5] == mark and board[3] == mark) or # diagonal
    (board[9] == mark and board[5] == mark and board[1] == mark)) # diagonal

def choose_first():
    if random.randint(0, 1) == 0:
        return 'Player 2'
    else:
        return 'Player 1'

def space_check(board, position):
    return board[position] == ' '

def full_board_check(board):
    for i in range(1,10):
        if space_check(board, i):
            return False
    return True

def player_choice(board):
    position = 0
    
    while position not in [1,2,3,4,5,6,7,8,9] or not space_check(board, position):
        position = int(input('Choose your next position: (1-9) '))
        
    return position

def replay():
    return input('Do you want to play again? Enter Yes or No: ').lower().startswith('y')

print('Welcome to Tic Tac Toe!')

while True:
    # Reset the board
    theBoard = [' '] * 10
    player1_marker, player2_marker = player_input()
    turn = choose_first()
    print(turn + ' will go first.')
    
    play_game = input('Are you ready to play? Enter Yes or No.')
    
    if play_game.lower()[0] == 'y':
        game_on = True
    else:
        game_on = False

    while game_on:
        if turn == 'Player 1':
            # Player1's turn.
            
            display_board(theBoard)
            position = player_choice(theBoard)
            place_marker(theBoard, player1_marker, position)

            if win_check(theBoard, player1_marker):
                display_board(theBoard)
                print('Congratulations! You have won the game!')
                game_on = False
            else:
                if full_board_check(theBoard):
                    display_board(theBoard)
                    print('The game is a draw!')
                    break
                else:
                    turn = 'Player 2'

        else:
            # Player2's turn.
            
            display_board(theBoard)
            position = player_choice(theBoard)
            place_marker(theBoard, player2_marker, position)

            if win_check(theBoard, player2_marker):
                display_board(theBoard)
                print('Player 2 has won!')
                game_on = False
            else:
                if full_board_check(theBoard):
                    display_board(theBoard)
                    print('The game is a draw!')
                    break
                else:
                    turn = 'Player 1'

    if not replay():
        break

print('Thanks for playing!')

上記のコードを実行して、ゲームしてみました。

3目並べのゲームが作れましたが、コンピュータと対戦できる仕様にはなっていません。

続いて、GPT-4の出力結果のコードは、以下の通りです。

import random

def draw_board(board):
    # This function prints out the board.
    print('   |   |')
    print(' ' + board[7] + ' | ' + board[8] + ' | ' + board[9])
    print('   |   |')
    print('-----------')
    print('   |   |')
    print(' ' + board[4] + ' | ' + board[5] + ' | ' + board[6])
    print('   |   |')
    print('-----------')
    print('   |   |')
    print(' ' + board[1] + ' | ' + board[2] + ' | ' + board[3])
    print('   |   |')

def input_player_letter():
    letter = ''
    while not (letter == 'X' or letter == 'O'):
        print('Do you want to be X or O?')
        letter = input().upper()
    if letter == 'X':
        return ['X', 'O']
    else:
        return ['O', 'X']

def who_goes_first():
    if random.randint(0, 1) == 0:
        return 'computer'
    else:
        return 'player'

def play_again():
    print('Do you want to play again? (yes or no)')
    return input().lower().startswith('y')

def make_move(board, letter, move):
    board[move] = letter

def is_winner(bo, le):
    return ((bo[7] == le and bo[8] == le and bo[9] == le) or
            (bo[4] == le and bo[5] == le and bo[6] == le) or
            (bo[1] == le and bo[2] == le and bo[3] == le) or
            (bo[7] == le and bo[4] == le and bo[1] == le) or
            (bo[8] == le and bo[5] == le and bo[2] == le) or
            (bo[9] == le and bo[6] == le and bo[3] == le) or
            (bo[7] == le and bo[5] == le and bo[3] == le) or
            (bo[9] == le and bo[5] == le and bo[1] == le))

def get_board_copy(board):
    dupe_board = []
    for i in board:
        dupe_board.append(i)
    return dupe_board

def is_space_free(board, move):
    return board[move] == ' '

def get_player_move(board):
    move = ' '
    while move not in '1 2 3 4 5 6 7 8 9'.split() or not is_space_free(board, int(move)):
        print('What is your next move? (1-9)')
        move = input()
    return int(move)

def choose_random_move_from_list(board, moves_list):
    possible_moves = [move for move in moves_list if is_space_free(board, move)]
    if possible_moves:
        return random.choice(possible_moves)
    else:
        return None

def get_computer_move(board, computer_letter):
    for i in range(1, 10):
        copy = get_board_copy(board)
        if is_space_free(copy, i):
            make_move(copy, computer_letter, i)
            if is_winner(copy, computer_letter):
                return i

    for i in range(1, 10):
        copy = get_board_copy(board)
        if is_space_free(copy, i):
            make_move(copy, player_letter, i)
            if is_winner(copy, player_letter):
                return i

    move = choose_random_move_from_list(board, [1, 3, 7, 9])
    if move != None:
        return move

    if is_space_free(board, 5):
        return 5

    return choose_random_move_from_list(board, [2, 4, 6, 8])

def is_board_full(board):
    for i in range(1, 10):
        if is_space_free(board, i):
            return False
    return True


print('Welcome to Tic Tac Toe!')

while True:
    the_board = [' '] * 10
    player_letter, computer_letter = input_player_letter()
    turn = who_goes_first()
    print('The ' + turn + ' will go first.')
    game_is_playing = True

    while game_is_playing:
        if turn == 'player':
            draw_board(the_board)
            move = get_player_move(the_board)
            make_move(the_board, player_letter, move)

            if is_winner(the_board, player_letter):
                draw_board(the_board)
                print('Hooray! You have won the game!')
                game_is_playing = False
            else:
                if is_board_full(the_board):
                    draw_board(the_board)
                    print('The game is a tie!')
                    break
                else:
                    turn = 'computer'

        else:
            move = get_computer_move(the_board, computer_letter)
            make_move(the_board, computer_letter, move)

            if is_winner(the_board, computer_letter):
                draw_board(the_board)
                print('The computer has beaten you! You lose.')
                game_is_playing = False
            else:
                if is_board_full(the_board):
                    draw_board(the_board)
                    print('The game is a tie!')
                    break
                else:
                    turn = 'player'

    if not play_again():
        break

上記のコードを実行して、ゲームしてみました。

こちらは3目並べとして成立しており、なおかつコンピュータとの対戦が実現しています。

タスク	CodeLlama-70B-Instruct	GPT-4
「画像認識の深層学習モデル」の自動構築	△ 精度54%のモデルができた	△ 精度53%のモデルができた
競技プログラミングの実際の問題	× 3問中1問程度しか正解できなかった	× 3問中1問程度しか正解できなかった
コンピュータと対戦できる3目並べゲーム作成	△ 3目並べができたが、コンピュータと対戦できる仕様にはなっていなかった	〇コンピュータと対戦できる3目並べができた

なお、プログラミングや数学に特化したLlama 2について詳しく知りたい方は、下記の記事を合わせてご確認ください。
→【LLaMA Pro 8B】脅威の性能！プログラミングや数学特化の改造版Llama 2の使い方〜実践まで

まとめ

1月29日、Meta社が公開した「CodeLlama-70B」は、従来のCodeLlamaシリーズよりも、高性能なモデルです。また、CodeLlama-70Bには、以下の3つのモデルがあります。

CodeLlama-70B：汎用モデル
CodeLlama-70B-Python：Python特化モデル
CodeLlama-70B-Instruct：Text-to-Codeに特化したモデル

また、比較検証では、以下の3つのタスクを実行しました。

「画像認識の深層学習モデル」の自動構築
競技プログラミングの実際の問題
コンピュータと対戦できる3目並べゲーム作成

深層学習モデルの構築では、精度が低かったですが、このモデルをベースとして、層やハイパーパラメータ等を自分で設定すれば、さらに精度の高いモデルを簡単に構築できるでしょう。

また、競技プログラミングの問題に関しては、問題文をそのまま入力するのではなく、より詳細に書く必要があるかと思います。加えて、入出力例を1つだけでなく、Few-shotでいくつか示すのも良いかもしれません。

ゲーム作成に関しては、GPT-4で作ったものの方が、クオリティが高いと感じました。

ちなみに今回のCodeLlama-70Bの評価に関して、Xユーザーによると、生成速度が速いとのこと！また、Perplexity Labsでは、ブラウザからCodeLlama-70bを試すことが可能だそう。

Meta社のCodeLlama-70B速度が素晴らしく速いです
実行環境がPerplexity Labsなので厳密に言うと、CodeLlama-70Bのみが速いというわけではないんですけど、
この精度のものをこの速度で手軽に利用できるのはめちゃくちゃいいですね
どれくらい速いかは下記動画をご覧ください！ pic.twitter.com/xouNIjeWLQ
— まさ/AI学習アプリ作ってます (@masatoshi9491) January 31, 2024

最後に

いかがだったでしょうか？

GPT-3.5 Turboの最新アップデートで、より高速かつ低コストでのAI活用が可能になりました。自社での導入・活用を検討する際に、最適なモデル選定や活用方法について、一緒に考えてみませんか？

弊社では

・マーケティングやエンジニアリングなどの専門知識を学習させたAI社員の開発
・要件定義・業務フロー作成を80%自動化できる自律型AIエージェントの開発
・生成AIとRPAを組み合わせた業務自動化ツールの開発
・社内人事業務を99%自動化できるAIツールの開発
・ハルシネーション対策AIツールの開発
・自社専用のAIチャットボットの開発

などの開発実績がございます。

まずは、「無料相談」にてご相談を承っておりますので、ご興味がある方はぜひご連絡ください。

➡︎生成AIを使った業務効率化、生成AIツールの開発について相談をしてみる。

生成AIを社内で活用していきたい方へ