In order to evaluate an application in code, we need a way to run the application. When using evaluate() (Python/TypeScript)we’ll do this by passing in a target function argument. This is a function that takes in a dataset Example’s inputs and returns the application output as a dict. Within this function we can call our application however we’d like. We can also format the output however we’d like. The key is that any evaluator functions we define should work with the output format we return in our target function.
Copy
Ask AI
from langsmith import Client# 'inputs' will come from your dataset.def dummy_target(inputs: dict) -> dict: return {"foo": 1, "bar": "two"}# 'inputs' will come from your dataset.# 'outputs' will come from your target function.def evaluator_one(inputs: dict, outputs: dict) -> bool: return outputs["foo"] == 2def evaluator_two(inputs: dict, outputs: dict) -> bool: return len(outputs["bar"]) < 3client = Client()results = client.evaluate( dummy_target, # <-- target function data="your-dataset-name", evaluators=[evaluator_one, evaluator_two], ...)
evaluate() will automatically trace your target function. This means that if you run any traceable code within your target function, this will also be traced as child runs of the target trace.
from langsmith import wrappersfrom openai import OpenAI# Optionally wrap the OpenAI client to automatically# trace all model calls.oai_client = wrappers.wrap_openai(OpenAI())def target(inputs: dict) -> dict: # This assumes your dataset has inputs with a 'messages' key. # You can update to match your dataset schema. messages = inputs["messages"] response = oai_client.chat.completions.create( messages=messages, model="gpt-4o-mini", ) return {"answer": response.choices[0].message.content}
from my_agent import agent # This is the function you will evaluate.def target(inputs: dict) -> dict: # This assumes your dataset has inputs with a `messages` key messages = inputs["messages"] # Replace `invoke` with whatever you use to call your agent response = agent.invoke({"messages": messages}) # This assumes your agent output is in the right format return response
If you have a LangGraph/LangChain agent that accepts the inputs defined in your dataset and that returns the output format you want to use in your evaluators, you can pass that object in as the target directly:
Copy
Ask AI
from my_agent import agentfrom langsmith import Clientclient = Client()client.evaluate(agent, ...)