ProNextJS
    lesson

    Implement Streaming AI Responses

    Jack HerringtonJack Herrington

    UPDATES: The course code has been updated to the newest version of the AI libraries and the code has been updated to reflect changes in the Vercel AI API.

    In our current application, we have a list of previous chats displayed on the homepage and a way to communicate with the AI.

    However, the AI responses can be quite lengthy, and it takes a while for them to be fully displayed. This is because we are blocking the entire response until it's complete before updating the UI, which is not ideal.

    Fortunately, the Vercel ai library that we installed earlier provides a way to implement streaming responses. This means that when you type in a question and receive a response, it comes in parts, similar to how ChatGPT works.

    Let's integrate this feature into our application.

    Setting up the API Endpoint

    To start, we need to create an API endpoint that the Vercel ai library can communicate with.

    Inside of the api directory, create a new file named chat/route.ts.

    In this code, we import OpenAIStream and StreamingTextResponse from ai, then import and initialize an OpenAI instance with the API key, just like we did in the server action.

    Then we define a POST function to handle the incoming requests. The function receives the request object, which is expected to be a JSON request containing the list of messages. We extract the messages using request.json(). For API endpoints you can handle any type of HTTP request, including GET, PUT, DELETE, etc. All you need to do is export a function named after the HTTP verb you want to handle.

    Next, we call openai.chat.completions.create() to get the completions, but this time, we set the stream mode to true in the configuration object we pass in. This enables the streaming response.

    Finally, we use OpenAIStream and StreamingTextResponse from the Vercel ai library to stream the response back to the client.

    Here's the code all together:

    import OpenAI from "openai";
    import { OpenAIStream, StreamingTextResponse } from "ai";
    
    export const runtime = "edge";
    
    const openai = new OpenAI({
      apiKey: process.env.OPENAI_API_KEY!,
    });
    
    export async function POST(req: Request) {
      const { messages } = await req.json();
    
      const response = await openai.chat.completions.create({
        model: "gpt-3.5-turbo",
        stream: true,
        messages,
      });
    
      const stream = OpenAIStream(response);
    
      return new StreamingTextResponse(stream);
    }
    

    This code makes it so that as soon as we receive some tokens, we start sending them back to the client without waiting for the entire response to be complete.

    Now that we have the API endpoint set up, let's update the Chat component to utilize the streaming response.

    Updating the Chat Component

    Open the Chat component from app/components/Chat.tsx.

    At the top of the file, there are some imports we need to bring in: the useChat hook from @vercel/ai/react and Message from @vercel/ai/types. We'll assert that the type of Message is AIMessage to align with the message type we're expecting from the AI:

    import type { Message as AIMessage } from "ai";
    import { useChat } from "ai/react";
    

    There's now quite a bit of stuff we can remove from the existing component.

    We no longer need the message related state management, since we will be replacing it with the useChat hook. We'll destructure several properties from the all to useChat, and coerce the initialMessages to an array of our AIMessage type:

    // inside the component, replace the useState hooks with:
    
    const { messages, input, handleInputChange, handleSubmit, isLoading } =
      useChat({
        initialMessages: initialMessages as unknown as AIMessage[],
    });
    

    Down in the Transcript component, we'll do the same coercion for the messages prop:

    <Transcript messages={messages as AIMessage[]} truncate={false} />
    

    Then we'll replace the existing div surrounding the existing Input with a new form. This form element will be managed by the useChat hook, so we'll pass the input and handleInputChange to the Input component and handleSubmit to the form:

    <form className="flex mt-3" onSubmit={handleSubmit}>
      <Input
        className="flex-grow text-xl"
        placeholder="Question"
        value={input}
        onChange={handleInputChange}
        autoFocus
      />
      <Button type="submit" className="ml-3 text-xl">
        Send
      </Button>
    </form>
    

    We also no longer need the onClick handler, so it can be removed.

    With these changes in place, we can check our work in the browser.

    Checking Our Work

    Over in the browser, refresh the page and ask a new question.

    The response should start streaming in as it is generated, rather than waiting for the entire response to be complete before displaying it:

    streaming response

    This works great, but we're no longer connected to the database. We need a way to update the chat in the database from the client-side.

    Reconnecting the Database

    In order to connect the useChat output to the database, we'll create a new server action called updateChat that will update the database every time there is a new message from the client.

    To start, create a new server action updateChat.ts inside the server-actions directory. This server action will be similar to what we did in getCompletion, where we either created a new chat or updated an existing chat based on the chatId.

    We'll import getServerSession from NextAuth, as well as our createChat and updateChat functions from the database. Then we'll define an updateChat function that takes a chatId and messages as parameters. Inside the function, we first check if a chat with the given chatId exists in the database. If it does, we update the chat by creating new messages. If the chat doesn't exist, we create a new chat with the provided chatId and messages.

    Here's the code all together:

    "use server";
    import { getServerSession } from "next-auth";
    
    import { createChat, updateChat as updateChatMessages } from "@/db";
    
    export const updateChat = async (
      chatId: number | null,
      messages: {
        role: "user" | "assistant";
        content: string;
      }[]
    ) => {
      const session = await getServerSession();
      if (!chatId) {
        return await createChat(
          session?.user?.email!,
          messages[0].content,
          messages
        );
      } else {
        await updateChatMessages(chatId, messages);
        return chatId;
      }
    };
    

    With the new server action done, we need to add it to the Chat component.

    Updating the Chat Component with the New Server Action

    Back in the Chat component, we'll replace getCompletion with the new updateChat server action.

    First, import the server action:

    import { updateChat } from "@/app/server-actions/updateChat";
    

    We need to call updateChat every time there's a change in the messages, so we'll import the useEffect hook from React.

    Add the useEffect hook to the component, with a dependency array of isLoading and messages from useChat, as well as router. The messages represent what will be added to the database, and router is there because otherwise ESLint will get angry.

    Inside of the useEffect hook, we check if the chat is not loading and there are messages. If both conditions are met, we map over the messages to only include the role and content properties.

    Then, we call the updateChat server action with the current chatId and the simplified messages. If we receive a chatId back, we navigate to the corresponding chat page using router.push and refresh.

    useEffect(() => {
      (async () => {
        if (!isLoading && messages.length) {
          const simplifiedMessages = messages.map((message) => ({
            role: message.role as "user" | "assistant",
            content: message.content,
          }));
          const newChatId = await updateChat(chatId.current, simplifiedMessages);
          if (chatId.current === null) {
            router.push(`/chats/${newChatId}`);
            router.refresh();
          } else {
            chatId.current = newChatId;
          }
        }
      })();
    }, [isLoading, messages, router]);
    

    With these changes, the chat messages will be updated in the database whenever they change.

    Deploying to Production

    Now that we have implemented streaming responses and database updates, it's time to push the changes to GitHub which will deploy to Vercel.

    The streaming responses feature of Vercel's ai library makes for a better user experience, and the database updates ensure that the chat messages are stored as they come in.

    Transcript

    All right, check out the application that we have. We have previous chats showing up on the home page. We've got a way to communicate with the AI. But I got to say, oftentimes the AI responses are really long and takes a while for them to respond.

    And right now, we're actually blocking on the whole response until that's come out. And then we give you your updated UI, which is not great, per se. So the Versal AI library that we installed way, way back

    actually has a way to do a streaming response. And so when you type in a question and you get the response back, it actually comes in in parts and pieces, just like you might expect from ChatGPT. And it's really easy to use. So let me go and show you how to install it in our application. So the first thing that the Versal AI wants is a endpoint to talk to.

    Let me go ahead and close up a couple of these. And we're going to create a new API endpoint under chat. So it's going to be apichat. And it's going to be an API endpoint, so it's going to be route.ts. And then we're going to paste in some code that's listed in your instructions.

    But I'll walk you through it. So the first thing we're going to do is bring in OpenAI. And then we're going to initialize that, just like we do in the server action. Right at the top there, we're going to initialize OpenAI and do exactly the same thing in our chat route. And then we're going to respond to post.

    So the hook that we're going to bring in makes a post to apichat. So we need to have post as the verb that we're going to handle. You can handle whichever verbs you want or one or more of them. You can handle get, post, put, delete, head.

    There's all the verbs that you can support. You just need to export the function that matches that verb, so in this case, post. That function gets the request object, request. Now, in this case, it's going to be a JSON request. It's going to have the list of messages. So we're going to crack that using rec.json.

    Then we're going to call OpenAI, get those completions back. But we're going to set the stream mode to true. So we're going to get a streaming response. And then we're going to use the OpenAI stream that's defined from the AI directory from Reversal, as well as a streaming text response,

    to then stream that response back to the client, meaning that as soon as we get some tokens, we're going to start sending them back to the client. We're not going to wait for the whole thing to be done. So now let's try this out in the chat UI. So to use this, we're going to use the use chat hook from AI React.

    And we also need to bring in the message type. And we're just going to change that to AI message, because that's the message type that we're going to get back from AI. So that's more of a TypeScript thing. Now, down here, we're actually going to be able to get rid of a bunch of stuff. We're going to get rid of message as a message, because use chat is actually going to handle all that.

    I'm going to bring that in. So I'm going to replace that with this call to use chat. And I'm going to set the messages, initial messages, to the messages array that we got back. We need to coerce that into that AI message type. Now, we don't need on click. Just going to get rid of that in a moment.

    We need to do a little coercion here as well. Now, down here, instead of this div, we're going to replace this entirely with a form. And that form is going to have an input control. That input control is going to be managed by what's coming out of use chat. Use chat has a bunch of cool stuff in here--

    input and handle input change. And we're going to send those on to that input component as the value and the on change. And then we're going to take that handle submit again that we get back from use chat and assign that to that form. So let's actually try this out. I think this might work in its current configuration.

    So go back over to local host. Go to a home page. And I want something that's going to have a fairly complicated answer. [TYPING SOUNDS]

    Now you see. Now it's starting to stream in. How cool is that? It didn't come in in one big block. It actually came in by parts and pieces, which is exactly what we want from our chat GPT UI. OK, let's go back into our app. And the problem now is that we are no longer connected to the database, right?

    We've got this output from this use chat system, which is great, but it's not actually going to a database anywhere. So what we need to do is we need to have a way where we can, from the client, tell the database to go and update the chat. So we're going to add a new server

    action called update chat. So down in our server actions directory, we'll create a new server action called update chat. And all update chat is going to do is take a chat ID, as well as messages, and then do pretty much exactly what we did over in get completions at the end here,

    where we look to see the chat ID. And then we either created a chat or updated a chat. Let's go take a look. We just either update or create a chat. And then we return the chat ID as the output here. Now let's go back over to our chat. And we'll bring that in instead of get completion. And then down here in the body of the component,

    we need to go and call that update chat based on the output of messages. So when the messages change, we need to call update chat with the new messages. So how do we do that? Well, we'll use a use effect for that. We'll bring in use effect. So in our use effect, we're going to use a dependency array with is loading, messages,

    and router. Is loading is a Boolean that we get from use chat. It is true if we are still streaming and false if we're not streaming. Messages, again, we get that back from use chat. That's the list of messages. We want to go and call that update chat with those messages. And then router, well, we're just putting router in there

    because the ESLint gets angry at us if we don't have router in there. And then inside of that use effect, we're going to see, OK, are we loading? If we're loading, we don't want to do anything. And if we aren't loading and we do have messages, then we're going to want to call update chat with those messages. Of course, we need to simplify those messages down to just roll content.

    So we'll simplify those. And then we'll do the same thing that we did before with the router push if we get a chat ID. So let's hit Save. Go back to our home. We'll ask a question. And now we get forwarded on to chats six,

    which is the chat ID, which means that we have updated the database. So now we've got streaming and we've also got database updating off the client. So cool. OK, time to put this into production. Let's go back over to our Visual Studio Code and push this to GitHub.

    And we'll check it out on Vercel. OK, it looks good. Let's try it out. Let's ask a long question. And we can see that we're getting the stream response back.