promptfoo ことはじめ

2025/05/17 19:02

2025/05/18 09:51

導入と基本的なアサーション

下記のようなかたちで利用の準備ができる

$ mkdir /path/to/project
$ cd /path/to/project
$ 
$ npm install -g promptfoo
$ promptfoo init

サンプルとして下記のような promptfoo.yaml を設定する

定義可能なアサーションは Assertions & metricsに記載されている
- contains: case sensitive で部分一致
- icontains: case insensitive で部分一致
- regex, starts-with などひととおりのアサーションが可能
- latency などのアサーションもできる
- llm-rubric で出力結果の正確性を LLM に判断させることも可能
利用可能なプロバイダーは LLM Providers に記載されている

# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: 'Getting started'
prompts:
  - 'Convert this English to {{language}}: {{input}}'
  - 'Translate to {{language}}: {{input}}'

providers:
  - google:gemini-2.5-flash-preview-04-17

tests:
  - vars:
      language: French
      input: Hello world
    assert:
      - type: contains
        value: 'Bonjour le monde'
  - vars:
      language: Spanish
      input: Where is the library?
    assert:
      - type: icontains
        value: 'Dónde está la biblioteca'

評価の実行は promptfoo eval コマンドで実行でき、さらに -o filename.html オプションをつけると HTML ファイルで評価結果を出力可能

$ promptfoo eval

Running 4 concurrent evaluations with up to 4 threads...
Creating cache folder at /Users/maintainer/.promptfoo/cache.

...

Successes: 3
Failures: 1
Errors: 0
Pass Rate: 75.00%
Duration: 5s (concurrency: 4)
Eval tokens: 1,673 / Prompt tokens: 38 / Completion tokens: 94 / Cached tokens: 0 / Reasoning tokens: 0
Total tokens: 1,673 (eval: 1,673 + Grading: 0)
Done.

promptfoo view コマンドを叩くと評価結果をブラウザ上で確認できる

Python でのアサーション

📚 see also: Python assertions | promptfoo

下記のようにアサーションの Python コードをインラインで定義できる

tests:
  - vars:
      language: Japanese
      input: Hello!
    assert:
      - type: contains
        value: こんにちは
  - vars:
      language: Japanese
      input: Where is the library?
    assert:
      - type: python
        value: '1'

もちろん外部ファイルに分離することも可能

プロンプトやテストを外部ファイルに分離する

単に file://path/to/prompt とすればよい

# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: 'Getting started'
prompts:
  - file://
  - 'Translate to {{language}}: {{input}}'

providers:
  - google:gemini-2.5-flash-preview-04-17

tests:
  - vars:
      language: French
      input: Hello world
    assert:
      - type: contains
        value: 'Bonjour le monde'
  - vars:
      language: Spanish
      input: Where is the library?
    assert:
      - type: icontains
        value: 'Dónde está la biblioteca'

Customer Provider を利用する

Provider として実行コマンドを記述できる

providers:
- 'exec: python hello.py'

Python であれば hello.py に対して sys.argv にプロンプトやアサーション情報が格納されて渡る

import json
import sys
import time

now = time.localtime()
print("Hello, World!")
print("Current time:", time.strftime("%H:%M:%S", now))
print(json.dumps(sys.argv, ensure_ascii=False))

上記のような Customer Provider で評価を実行すると標準出力に書き出されたテキストがその Provider のレスポンスであるものとして扱われる

Hello, World!
Current time: 13:12:09
["hello.py", "Translate to Japanese: Hello!\n", "{\"config\":{\"basePath\":\"\"},\"env\":{}}", "{\"vars\":{\"language\":\"Japanese\",\"input\":\"Hello!\"},\"prompt\":{\"raw\":\"Translate to {{language}}: {{input}}\\n\",\"label\":\"prompt1.md: Translate to {{language}}: {{input}}\\n...\"},\"filters\":{},\"originalProvider\":{\"scriptPath\":\" python hello.py\",\"options\":{\"config\":{\"basePath\":\"\"},\"env\":{}},\"delay\":0,\"label\":\"\"},\"test\":{\"assert\":[{\"type\":\"contains\",\"value\":\"こんにちは\"}],\"options\":{},\"metadata\":{}}}"]

ログと Provider レスポンスを分けたい場合、下記のように Python Provider を指定する

providers:
  - id: 'file://hello.py'
    label: 'HelloPythonProvider'
    config:
      additionalOption: 123

Python Provider はシンプルに call_api 関数を定義し、JSON の output キー配下が Provider レスポンスとして扱われる

import json
import sys
import time
from typing import Any, Dict


def call_api(prompt: str, options: Dict[str, Any], context: Dict[str, Any]):
    now = time.localtime()
    print("Hello, World!")
    print("Current time:", time.strftime("%H:%M:%S", now))
    return {
        "output": "こんにちは"
    }

上記のような Python Provider を利用しており標準出力のログを確認したい場合は LOG_LEVEL=debug promptfoo eval というコマンドで実行する

また利用する Python のバイナリパスは PYTHONPATH もしくは PROMPTFOO_PYTHON にて指定できる

📚 see also:

評価オプション

📚 see also: Reference | promptfoo

並列実行の制御

評価の並列度は evaluateOptions.maxConcurrency で制御できる

evaluateOptions:
  maxConcurrency: 2

複数回のテスト実行

Temperature の大きいプロンプトなどの品質をチェックするときには複数回アサーションを行っておきたいがそのようなことも evaluateOptions.delay オプションで指定可能

テスト前後の共通処理

extensions に実行したいフック処理が記載されたファイルを指定することにより実現可能

extensions:
  - file://extensions.py:extension_hook

実際のフック処理は下記のような実装となる

def extension_hook(hook_name, context):
    match hook_name:
        case 'beforeAll':
            print("Executing beforeAll hook")
        case 'afterAll':
            print("Executing afterAll hook")
        case 'beforeEach':
            print("Executing beforeEach hook")
        case 'afterEach':   
            print("Executing afterEach hook")

シェアの無効化

📚 see also: Sharing | promptfoo

sharing オプションで設定可能

sharing: false

Pinned Articles

📝 女性声優と自動化 📝 RxSwift の Observable とは何か 📝 株式会社ドワンゴを退職しました 📝 RHEL7 系のネットワーク設定 📝 iOSDC Japan 2017にてベストトーク賞をいただきました 📝 大規模ネイティブアプリへのプッシュ通知機能導入にあたって考えたこと 📝 他クラスに依存しないテストを支える仕組み：スタブ・モック・スパイ

About

ウェブ界隈でエンジニアとして労働活動に励んでいる @gomi_ningen 個人のブログです