Behind the Scenes: How LangChain calculates OpenAI's pricing?

tl;dr

This post covers how LangChain calculates pricing when one uses OpenAI’s LLM.
There are two functions that help in this. The first function maintains the model cost mapping. The second function calculates the cost given a response from OpenAI’s API.
Some insights on why LangChain exists and how it is helpful for developers.

Today’s post is a dive into the source code of a popular open-source package called LangChain. It’s a framework for developing applications powered by language models.

We will cover how LangChain calculates the OpenAI cost given the response.

There is a get_openai_callback function from langchain.callbacks

This page in LangChain docs lists how it can be used

from langchain.llms import OpenAI
from langchain.callbacks import get_openai_callback

llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)

with get_openai_callback() as cb:
    result = llm("Tell me a joke")
    print(cb)


Tokens Used: 42
    Prompt Tokens: 4
    Completion Tokens: 38
Successful Requests: 1
Total Cost (USD): $0.00084

Here the model name is text-davinci-002 (LangChain team if you are reading this - doc needs an update), but the same will work with gpt-3.5-turbo or gpt-4 .

I was more interested in the source code, so I started digging into the code base. The source code for this function exists in langchain/callbacks/openai_info.py

Here is the first function defined in the file - get_openai_model_cost_per_1k_tokens. Below you can find its source code:

def get_openai_model_cost_per_1k_tokens(
    model_name: str, is_completion: bool = False
) -> float:
    model_cost_mapping = {
        "gpt-4": 0.03,
        "gpt-4-0314": 0.03,
        "gpt-4-completion": 0.06,
        "gpt-4-0314-completion": 0.06,
        "gpt-4-32k": 0.06,
        "gpt-4-32k-0314": 0.06,
        "gpt-4-32k-completion": 0.12,
        "gpt-4-32k-0314-completion": 0.12,
        "gpt-3.5-turbo": 0.002,
        "gpt-3.5-turbo-0301": 0.002,
        "text-ada-001": 0.0004,
        "ada": 0.0004,
        "text-babbage-001": 0.0005,
        "babbage": 0.0005,
        "text-curie-001": 0.002,
        "curie": 0.002,
        "text-davinci-003": 0.02,
        "text-davinci-002": 0.02,
        "code-davinci-002": 0.02,
    }

    cost = model_cost_mapping.get(
        model_name.lower()
        + ("-completion" if is_completion and model_name.startswith("gpt-4") else ""),
        None,
    )
    if cost is None:
        raise ValueError(
            f"Unknown model: {model_name}. Please provide a valid OpenAI model name."
            "Known models are: " + ", ".join(model_cost_mapping.keys())
        )

    return cost

So this is the function which is maintaining the entire cost given a model name. Let’s say OpenAI adds a new model ( gpt-5 maybe? or gpt-4-turbo), then this is the place its pricing will get stored.

Also, note that this function has a is_completion flag. This flag is used to pick the cost of completion or other models.

For ease, you can consider that the above function is maintaining a mapping of model_name and its cost per 1000 tokens, called as model_cost_mapping here onwards

model_cost_mapping = {
    "gpt-3.5-turbo": 0.002,
    "gpt-4": 0.03
}

Let’s see the other code in this file. There is a class called OpenAICallbackHandler and in it, there is a function called on_llm_end . I think that this function must be called at the end when the response from LLM is generated completely. Let’s verify this. I searched for the reference of this function and found out that on_llm_end is called in generate function in BaseLLM class defined in llms/base.py. It gets called after the response from LLM is generated.

Now, let’s dig into on_llm_end and see how it’s using the model_cost_mapping

def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
        """Collect token usage."""
        if response.llm_output is not None:
            self.successful_requests += 1
            if "token_usage" in response.llm_output:
                token_usage = response.llm_output["token_usage"]
                if "model_name" in response.llm_output:
                    completion_cost = get_openai_model_cost_per_1k_tokens(
                        response.llm_output["model_name"], is_completion=True
                    ) * (token_usage.get("completion_tokens", 0) / 1000)
                    prompt_cost = get_openai_model_cost_per_1k_tokens(
                        response.llm_output["model_name"]
                    ) * (token_usage.get("prompt_tokens", 0) / 1000)

                    self.total_cost += prompt_cost + completion_cost

                if "total_tokens" in token_usage:
                    self.total_tokens += token_usage["total_tokens"]
                if "prompt_tokens" in token_usage:
                    self.prompt_tokens += token_usage["prompt_tokens"]
                if "completion_tokens" in token_usage:
                    self.completion_tokens += token_usage["completion_tokens"]

Let’s understand what the above function is doing. Before proceeding, let’s see how a sample response from OpenAI API looks like

>>> import openai

>>> openai.api_key = 'sk-<your key here>'
>>> response = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role":"user", "content": "hey, how are you?"}], temperature=0.7)

>>> print(response)
{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "As an AI language model, I do not have emotions, but I'm functioning properly. How can I assist you today?",
        "role": "assistant"
      }
    }
  ],
  "created": 1685475109,
  "id": "chatcmpl-7LzMLNGeUTvJsi0eJYrdrZow0DRnU",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 25,
    "prompt_tokens": 14,
    "total_tokens": 39
  }
}

This is interesting. In the response received from OpenAI, I am getting token details in usage key whereas in LangChain, the code is looking for token_usage.

I did a little digging here and found out that LangChain extracts the usage key only but puts it in a token_usage key in chat_models/openai.py . Here is the relevant code snippet

# langchain/chat_models/openai.py

def _create_chat_result(self, response: Mapping[str, Any]) -> ChatResult:
    generations = []
    for res in response["choices"]:
        message = _convert_dict_to_message(res["message"])
        gen = ChatGeneration(message=message)
        generations.append(gen)
    llm_output = {"token_usage": response["usage"], "model_name": self.model_name}
    return ChatResult(generations=generations, llm_output=llm_output)

An interesting piece of thought here is why LangChain does this.

LangChain does this so that it can easily maintain interoperability between various LLM providers and expose a unified interface to all the developers. This is one of the main reasons why developers love LangChain a lot.

Imagine Anthropic’s model started performing better one day than GPT-4, all you have to do is just change the provider from LangChain and everything continues to function as normal. If you are not using LangChain, then you will have to implement a lot of logic, input, and output format changes at your end, thereby increasing your work and introducing more errors in a production environment.

Coming back to the cost calculation piece now, we can see the function on_llm_end is very straightforward. It checks for relevant keys in the response and then calculates the cost using model_cost_mapping.

I am sharing a Python code snippet on the basis of the above code. This will be helpful in case you want to just use the cost calculation logic given an OpenAI response at your end.

# credits: https://github.com/hwchase17/langchain/blob/master/langchain/callbacks/openai_info.py

>>> import openai

>>> def get_openai_model_cost_table(model_name='gpt-3.5-turbo', is_completion=False):
    model_cost_mapping = {
        "gpt-4": 0.03,
        "gpt-4-0314": 0.03,
        "gpt-4-completion": 0.06,
        "gpt-4-0314-completion": 0.06,
        "gpt-4-32k": 0.06,
        "gpt-4-32k-0314": 0.06,
        "gpt-4-32k-completion": 0.12,
        "gpt-4-32k-0314-completion": 0.12,
        "gpt-3.5-turbo": 0.002,
        "gpt-3.5-turbo-0301": 0.002,
        "text-ada-001": 0.0004,
        "ada": 0.0004,
        "text-babbage-001": 0.0005,
        "babbage": 0.0005,
        "text-curie-001": 0.002,
        "curie": 0.002,
        "text-davinci-003": 0.02,
        "text-davinci-002": 0.02,
        "code-davinci-002": 0.02,
    }
    cost = model_cost_mapping.get(
        model_name.lower()
        + ("-completion" if is_completion and model_name.startswith("gpt-4") else ""),
        None,
    )
    if cost is None:
        raise ValueError(
            f"Unknown model: {model_name}. Please provide a valid OpenAI model name."
            "Known models are: " + ", ".join(model_cost_mapping.keys())
        )
    return cost


>>> def get_openai_cost(response):
    """
    Pass openai response object and get total cost.
    """
    total_cost = 0
    if "usage" in response:
        completion_cost = get_openai_model_cost_table(
            model_name=response["model"],
            is_completion=True,
        ) * (response["usage"].get("completion_tokens", 0) / 1000.0)
        prompt_cost = get_openai_model_cost_table(
            model_name=response["model"]
        ) * (response["usage"].get("prompt_tokens", 0) / 1000.0)
        total_cost = prompt_cost + completion_cost
    return total_cost



>>> openai.api_key = 'sk-<your key here>'
>>> response = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role":"user", "content": "hey, how are you?"}], temperature=0.7)

>>> print("Total cost: ", get_openai_cost(response))

If you want to read any past issues, check them out here.

If you have any feedback/issues etc, reply here or DM me on Twitter.

Behind the Scenes: How LangChain calculates OpenAI's pricing?

Includes insight on why LangChain is helpful for developers

Table of contents

tl;dr