Reputation: 1
from langchain_openai import ChatOpenAI
from langchain.evaluation import load_evaluator
lm = ChatOpenAI(base_url="http://localhost:1234/v1", api_key="")
evaluator = load_evaluator("labeled_pairwise_string", llm=llm)
with open('test_cases.json', 'r') as file:
test_cases = json.load(file)
with open('prediction_ollama.json', 'r') as file:
predictions = json.load(file)
with open('prediction_gemma.json', 'r') as file:
predictions_b = json.load(file)
with open('prediction_mistral.json', 'r') as file:
predictions_c = json.load(file)
results = []
for i, test_case in enumerate(test_cases):
result = evaluator.evaluate_string_pairs(
input=test_case["input"],
prediction=predictions[i]["prediction"],
prediction_b=predictions_b[i]["prediction_b"]
)
results.append((f"\nTest Case {i+1}", result))
for test_name, result in results:
print(test_name, "->", result)
I am currently trying to compare multiple predictions from different LLMs to evaluate which gives the best answer for my use case. Is there a way to compare more than two predictions? As I understand it, the evaluator.evaluate_string_pairs function can only compare two strings at a time, so I'm not sure how I would accomplish this task. Any advice would be appreciated.
Upvotes: 0
Views: 20