Behind the Scenes: How LangChain calculates OpenAI's pricing?
Includes insight on why LangChain is helpful for developers
Table of contents
tl;dr
This post covers how LangChain calculates pricing when one uses OpenAI’s LLM.
There are two functions that help in this. The first function maintains the model cost mapping. The second function calculates the cost given a response from OpenAI’s API.
Some insights on why LangChain exists and how it is helpful for developers.
Today’s post is a dive into the source code of a popular open-source package called LangChain. It’s a framework for developing applications powered by language models.
We will cover how LangChain calculates the OpenAI cost given the response.
There is a get_openai_callback
function from langchain.callbacks
This page in LangChain docs lists how it can be used
from langchain.llms import OpenAI
from langchain.callbacks import get_openai_callback
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)
with get_openai_callback() as cb:
result = llm("Tell me a joke")
print(cb)
Tokens Used: 42
Prompt Tokens: 4
Completion Tokens: 38
Successful Requests: 1
Total Cost (USD): $0.00084
Here the model name is text-davinci-002
(LangChain team if you are reading this - doc needs an update), but the same will work with gpt-3.5-turbo
or gpt-4
.
I was more interested in the source code, so I started digging into the code base. The source code for this function exists in langchain/callbacks/openai_info.py
Here is the first function defined in the file - get_openai_model_cost_per_1k_tokens
. Below you can find its source code:
def get_openai_model_cost_per_1k_tokens(
model_name: str, is_completion: bool = False
) -> float:
model_cost_mapping = {
"gpt-4": 0.03,
"gpt-4-0314": 0.03,
"gpt-4-completion": 0.06,
"gpt-4-0314-completion": 0.06,
"gpt-4-32k": 0.06,
"gpt-4-32k-0314": 0.06,
"gpt-4-32k-completion": 0.12,
"gpt-4-32k-0314-completion": 0.12,
"gpt-3.5-turbo": 0.002,
"gpt-3.5-turbo-0301": 0.002,
"text-ada-001": 0.0004,
"ada": 0.0004,
"text-babbage-001": 0.0005,
"babbage": 0.0005,
"text-curie-001": 0.002,
"curie": 0.002,
"text-davinci-003": 0.02,
"text-davinci-002": 0.02,
"code-davinci-002": 0.02,
}
cost = model_cost_mapping.get(
model_name.lower()
+ ("-completion" if is_completion and model_name.startswith("gpt-4") else ""),
None,
)
if cost is None:
raise ValueError(
f"Unknown model: {model_name}. Please provide a valid OpenAI model name."
"Known models are: " + ", ".join(model_cost_mapping.keys())
)
return cost
So this is the function which is maintaining the entire cost given a model name. Let’s say OpenAI adds a new model ( gpt-5
maybe? or gpt-4-turbo
), then this is the place its pricing will get stored.
Also, note that this function has a is_completion
flag. This flag is used to pick the cost of completion or other models.
For ease, you can consider that the above function is maintaining a mapping of model_name and its cost per 1000 tokens, called as model_cost_mapping
here onwards
model_cost_mapping = {
"gpt-3.5-turbo": 0.002,
"gpt-4": 0.03
}
Let’s see the other code in this file. There is a class called OpenAICallbackHandler
and in it, there is a function called on_llm_end
. I think that this function must be called at the end when the response from LLM is generated completely. Let’s verify this. I searched for the reference of this function and found out that on_llm_end
is called in generate
function in BaseLLM
class defined in llms/base.py
. It gets called after the response from LLM is generated.
Now, let’s dig into on_llm_end
and see how it’s using the model_cost_mapping
def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
"""Collect token usage."""
if response.llm_output is not None:
self.successful_requests += 1
if "token_usage" in response.llm_output:
token_usage = response.llm_output["token_usage"]
if "model_name" in response.llm_output:
completion_cost = get_openai_model_cost_per_1k_tokens(
response.llm_output["model_name"], is_completion=True
) * (token_usage.get("completion_tokens", 0) / 1000)
prompt_cost = get_openai_model_cost_per_1k_tokens(
response.llm_output["model_name"]
) * (token_usage.get("prompt_tokens", 0) / 1000)
self.total_cost += prompt_cost + completion_cost
if "total_tokens" in token_usage:
self.total_tokens += token_usage["total_tokens"]
if "prompt_tokens" in token_usage:
self.prompt_tokens += token_usage["prompt_tokens"]
if "completion_tokens" in token_usage:
self.completion_tokens += token_usage["completion_tokens"]
Let’s understand what the above function is doing. Before proceeding, let’s see how a sample response from OpenAI API looks like
>>> import openai
>>> openai.api_key = 'sk-<your key here>'
>>> response = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role":"user", "content": "hey, how are you?"}], temperature=0.7)
>>> print(response)
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "As an AI language model, I do not have emotions, but I'm functioning properly. How can I assist you today?",
"role": "assistant"
}
}
],
"created": 1685475109,
"id": "chatcmpl-7LzMLNGeUTvJsi0eJYrdrZow0DRnU",
"model": "gpt-3.5-turbo-0301",
"object": "chat.completion",
"usage": {
"completion_tokens": 25,
"prompt_tokens": 14,
"total_tokens": 39
}
}
This is interesting. In the response received from OpenAI, I am getting token details in usage
key whereas in LangChain, the code is looking for token_usage
.
I did a little digging here and found out that LangChain extracts the usage
key only but puts it in a token_usage
key in chat_models/openai.py
. Here is the relevant code snippet
# langchain/chat_models/openai.py
def _create_chat_result(self, response: Mapping[str, Any]) -> ChatResult:
generations = []
for res in response["choices"]:
message = _convert_dict_to_message(res["message"])
gen = ChatGeneration(message=message)
generations.append(gen)
llm_output = {"token_usage": response["usage"], "model_name": self.model_name}
return ChatResult(generations=generations, llm_output=llm_output)
An interesting piece of thought here is why LangChain does this.
LangChain does this so that it can easily maintain interoperability between various LLM providers and expose a unified interface to all the developers. This is one of the main reasons why developers love LangChain a lot.
Imagine Anthropic’s model started performing better one day than GPT-4, all you have to do is just change the provider from LangChain and everything continues to function as normal. If you are not using LangChain, then you will have to implement a lot of logic, input, and output format changes at your end, thereby increasing your work and introducing more errors in a production environment.
Coming back to the cost calculation piece now, we can see the function on_llm_end
is very straightforward. It checks for relevant keys in the response and then calculates the cost using model_cost_mapping
.
I am sharing a Python code snippet on the basis of the above code. This will be helpful in case you want to just use the cost calculation logic given an OpenAI response at your end.
# credits: https://github.com/hwchase17/langchain/blob/master/langchain/callbacks/openai_info.py
>>> import openai
>>> def get_openai_model_cost_table(model_name='gpt-3.5-turbo', is_completion=False):
model_cost_mapping = {
"gpt-4": 0.03,
"gpt-4-0314": 0.03,
"gpt-4-completion": 0.06,
"gpt-4-0314-completion": 0.06,
"gpt-4-32k": 0.06,
"gpt-4-32k-0314": 0.06,
"gpt-4-32k-completion": 0.12,
"gpt-4-32k-0314-completion": 0.12,
"gpt-3.5-turbo": 0.002,
"gpt-3.5-turbo-0301": 0.002,
"text-ada-001": 0.0004,
"ada": 0.0004,
"text-babbage-001": 0.0005,
"babbage": 0.0005,
"text-curie-001": 0.002,
"curie": 0.002,
"text-davinci-003": 0.02,
"text-davinci-002": 0.02,
"code-davinci-002": 0.02,
}
cost = model_cost_mapping.get(
model_name.lower()
+ ("-completion" if is_completion and model_name.startswith("gpt-4") else ""),
None,
)
if cost is None:
raise ValueError(
f"Unknown model: {model_name}. Please provide a valid OpenAI model name."
"Known models are: " + ", ".join(model_cost_mapping.keys())
)
return cost
>>> def get_openai_cost(response):
"""
Pass openai response object and get total cost.
"""
total_cost = 0
if "usage" in response:
completion_cost = get_openai_model_cost_table(
model_name=response["model"],
is_completion=True,
) * (response["usage"].get("completion_tokens", 0) / 1000.0)
prompt_cost = get_openai_model_cost_table(
model_name=response["model"]
) * (response["usage"].get("prompt_tokens", 0) / 1000.0)
total_cost = prompt_cost + completion_cost
return total_cost
>>> openai.api_key = 'sk-<your key here>'
>>> response = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role":"user", "content": "hey, how are you?"}], temperature=0.7)
>>> print("Total cost: ", get_openai_cost(response))
If you want to read any past issues, check them out here.
If you have any feedback/issues etc, reply here or DM me on Twitter.