Video Demo

Introduction

These days, we can learn languages from many sources, such as school, YouTube videos, or language learning apps. Within these sources, there are even more unique language learning methods available to us. You may have experienced matching words in the language you're learning to objects, filling out missing words from a sentence, or even filling out the next part of a dialogue in a conversation.

There's nothing wrong with these methods and it will definitely guide you towards your goal of learning a language. However, I believe the best way to learn to speak a language is to actually actively speak that language.

You hear many stories of gamers becoming fluent in English simply by having gaming sessions and engaging in active conversations with foreigners that speak English. People also plan meet-ups with foreigners to learn each other's languages (conventions, field trips, etc.)

However, this method of engaging in real conversations in the language you're learning isn't available to everyone. I'd argue it's the most inaccessible method for most people. That's why I created "Banter Bot".

The Idea

In order to bridge this unavailability, I started brain-storming ideas for applications that would allow its users to have natural active conversations in a requested language with counterparts that actually speak it without the barriers.

Many other applications actually connect you to native speakers, and this does eliminate some problems like location. However, the human element in that method can still bring up other problems. The availability of the other human itself (time zones?). Also, people just starting out in learning a language may experience social anxiety or people may just be scared to talk to other people in general.

So, how do I remove the human element and still regain the diversity, dynamism, and responsiveness of it? Language tests usually have you participating in conversations by finishing or filling gaps in dialogue. But this is static. The conversation is predetermined. It doesn't respond to you specifically, no matter how you talk in it.

Luckily, with the advancements in AI, the solution was clear. The idea was to let users have dynamic conversations with AI and let it respond like a real person. Basically simulating a real conversation with a native speaker.

Banter Bot

With the idea solidified and the solution clear, I started development. I came up with the name "Banter Bot". It describes the app and there's alliteration! I also came up with this simple logo, which I'm quite proud of:

Description

Banter Bot is a language learning platform where you can chose from an array of unique AI personalities and engage in active conversation with them about equally unique topics. These unique personalities will help the conversation not become repetitive, stay dynamic, and engage the users. Banter Bot includes features that help you directly in learning a particular language, in particular is the feedback system.

On top of just having conversations, as you chat with a particular AI personality, you will get constant feedback about what you say. You will be notified of mistakes you make, the correct way you should have said something, and just general explanation about the grammar, syntax, phrases you may have used, etc. about what you said.

Banter Bot will also allow you to have weekly reviews generated by AI. You will be able to reflect upon how you're doing currently and how you have improved in every language you are trying to learn.

Application Flow

In this section, I'll be showing you particular features of Banter Bot along with a picture and some description.

Home Page

Not much to say, here, but quite proud of myself for the color palette I chose and the background. I also like the font on the logo.

Register/Login

Since I want users to be able to track their progress, I implemented a register and login function.

Personalities

Here's where you'll be reviewing all the AI personalities that you can interact with. There's only five currently at the time of writing this blog, but I plan on adding more personalities and with more unique traits. Maybe a talking dog? The possibilities are endless.

There are also plans on adding user-created personality, but I would have to figure out a way to filter out unsavory entries. For now, it's my job to fill out these personalities.

You'll see that each personality has a unique name, traits, and description.

Learning Track (Dashboard)

Here is where the magic starts. It's your learning track. Immediately you'll see conversations available to you as well its status of either "Not started", "In progress", and "Completed". We'll get to what this means later.

Choosing Language

Banter Bot also allows you to practice in a variety of languages. At this time, there's only 5 languages, but realistically, it can really have any language used enough in the world. The progress on your conversation also are unique to each language. Let's say you've completed all conversations in English; you can start all over again in a different language like Spanish.

Starting a Conversation

After clicking on a new conversation that you haven't yet started, you'll be shown a dialog where you'll be given a choice of all the personalities you can have the conversation with. After clicking "Start Conversation", you'll be redirected to the actual conversation page.

Conversation

This is where the magic actually happens. This is where you'll be engaging in active conversation with the personality you have chosen. You have your header displaying general info about the particular conversation, like the personality name and the topic, the language it's in, and whether or not the conversation goal has been reached.

Now I'll talk about the conversation status from the dashboard page. Each conversation in each language can either be not started, in progress, and completed. "Not started" means you haven't even chosen a personality to chat with. "In progress" means that you have chosen a personality and started the conversation, but have yet to reach the conversation goal.

What is the conversation goal? To help users keep track of their own progress and give a sense of accomplishment as well as direction, I have provided each conversation with a hidden goal that the user AND AI must reach. This way the user and the personality itself can stay on track.

You can still chat with the personality even after the conversation has been completed. This is just to discourage users to have simple conversations like greetings, etc.

Feedback System

Here's the feedback system. Aside from getting experience in conversation from just having conversation in a language, you'll also be getting extra information through feedback.

If you make mistakes like grammar or spelling errors, the particular chat bubble that mistake is found will get a flashing orange exclamation mark. You can click on this and it will bring up a dialog.

This dialog will contain your original text, the correct way you should have typed the sentence, mistakes found, and general explanation about what you said.

The explanation part is extra important, because even with perfect grammar and spelling, some phrases might not be the correct thing to say in certain contexts.

The explanation section will also offer general info about what you said, like tense form, facts about phrases/idioms, and even fun facts.

Translation System

I've also supported translation for conversations that are not in English. This will help users get even more context instead of just blindly responding to messages they can't understand.

Profile

This isn't as crucial but important none the less. It's a profile page. Users will be able to update some information about themselves, such as username, which is utilized so that AI can refer to the correct person. Users will also be able to set the default language here, so when they open the dashboard, they don't have to constantly change the selected language.

Review System

Finally, one of Banter Bot's finest feature, which is the review system. Instead of reflecting back on messages you've sent and their corresponding feedbacks, you can ask AI to generate a review based on your progress and score you on areas like grammar, vocabulary, and spelling.

You can generate a review for each language every time you have 15 new messages sent in that particular language. And you can renew/update the review every week, so you can continue reflecting on your progress and keep improving.

Development

Tech Stack

Some important considerations needed to be made before choosing my tech stack. The first consideration is what AI tool I wanted to use. I immediately decided that I would use Gemini. It's supported by Google and I had just created an application using it.

I wanted this app to be the most available to everyone, so I choose to make a web app. For the frontend I chose the reliable NextJs with TailwindCSS.

With Gemini as my AI tool, I needed a way to call upon it without exposing my API keys. Thankfully, I had NextJs chosen as my frontend, because NextJs allows you to have edge functions called from the server. This is where I can hide my API keys.

As said before, I wanted users to keep track of their progress. This means authentication and database storage. The choice was immediately obvious. I chose Appwrite. It has both of what I needed and more if I need it in the future.

Appwrite also allowed me to hook into user registering so I can create the corresponding profile object for them (Appwrite functions). So each time a user registers, this ensures they have a profile to store their progress and other settings.

Use of AI

AI is used in a few places in Banter But and is an integral part of it. Some of the important uses of AI in Banter Bot are in response generation by our AI Personalities and in the generation of the Feedback System. AI also decides whether a goal of a conversation has been reached.

Here's an example of using Gemini to simulate conversation:

// Getting all the messages from  particular conversation
// in Appwrite
const resMessages = await databases.listDocuments(
  config.dbId,
  config.messageCollectionId,
  [Query.equal("userConversationId", userConversation.$id)]
);

const messages = resMessages.documents as Message[];
const sortedMessages = messages.sort(
  (a, b) =>
    new Date(a.$createdAt).getTime() - new Date(b.$createdAt).getTime()
);

// string together history so AI respond appropriately
// with the start being the conversation prompt. This prompt will contain
// the AI personality as well as the conversation topic.
const history: Content[] = [
  {
    role: "user",
    parts: [
      {
        text: `${userConversation.prompt}
        user message: ${userMessage ? `"${userMessage}"` : "null"}
        `,
      },
    ],
  },
  ...sortedMessages.map((m) => {
    return {
      role: m.senderType === SenderType.USER ? "user" : "model",
      parts: [
        {
          text: `\`\`\`json
        {
          "message": ${m.textContent},
          "isGoalReached": false
        }
        \`\`\``,
        },
      ],
    };
  }),
];

// ask for response by Gemini
const chatSession = geminiModel.startChat({
  generationConfig,
  history,
});

const result = await chatSession.sendMessage(
  userMessage ? userMessage.textContent : "start the conversation"
);

Important Links

Here are some related links to the project:

Technical Side

This section covers all technical/code related information about the projects, so feel free to skip over this. I want to cover some things such as design patterns and library implementations to share and teach how I created parts of this project and also open things up for improvements and suggestions.

NextJS

My go to frontend framework. I usually pair it with TailwindCSS and Shadcn to create beautiful and cohesive UI blazingly fast. Shadcn has a very simple installation tutorial to setup NextJS with TailwindCSS and Shadcn here.

As well as the front of my app, I also use NextJS to do all my AI generation. Using its edge functions, I can freely call Gemini without exposing my API key. To do this, create a file in your root NextJS folder called .env.local and fill it with variables like this:

NEXT_PUBLIC_VAR=some-value
GEMINI_API_KEY=your-gemini-key

To make sure an environment isn't publicly exposed, DO NOT start it with NEXT_PUBLIC. This prefix ensures client side pages can see environment variables, which you do not want.

After this, you can call the API key using process.env.GEMINI_API_KEY in any server pages or api routes.

Appwrite

I love Appwrite. It's got everything you need on a web app, so naturally when I am unsure of what I will need in a project, I will default to Appwrite. And it has a very generous free tier.

Here's how I usually set up an Appwrite project on NextJS:

I create a lib/appwrite.ts folder and fill it with the following:

import { Client, Account, Databases, Functions } from "appwrite";

export const config = {
  projectId: String(process.env.NEXT_PUBLIC_APPWRITE_PROJECT_ID),
  dbId: String(process.env.NEXT_PUBLIC_DB_ID),
  functionID: String(process.env.NEXT_PUBLIC_FUNC_ID),

  someCollectionID: String(
    process.env.NEXT_PUBLIC_SOME_COLLECTION_ID
  ),
};

export const client = new Client();

client.setEndpoint("https://cloud.appwrite.io/v1").setProject(config.projectId);

export const account = new Account(client);

export const databases = new Databases(client);

export const functions = new Functions(client);

On top of storing Appwrite variables as environment variables, I also store it in a config object to make access easier. I then export what I need from Appwrite, in this case: Account, Databases, and Functions.

Static Data

As said before, Appwrite has a very generous free tier. However, I still don't want unnecessary requests. For example, I have preset conversation topics for users to choose. It will be unlikely to be updated live while the user is using the app, so I don't want to fetch the documents from Appwrite every time the user visits the relevant page.

So, I fixed this using React Context near the root of the application. That way, no matter where the user goes in the app, it will use the same data generated from the start. I use a generic Context called DataContext so that I will be able to add other data.

export interface DataContextData {
  conversations: RemoteDataWithSetter<Conversation[]>;
}

export const DataContext = createContext<DataContextData>({
  conversations: {
    isLoading: false,
    data: [],
    setData: () => {},
  },
});

export const useData = (): DataContextData => {
  const context = useContext(DataContext);
  if (!context) {
    throw new Error(
      "useData must be used within a corresponding ContextProvider"
    );
  }
  return context;
};

Then I put the Context Provider near the root above all pages that might need it.

export const DataContextProvider: React.FC<{ children: ReactNode }> = ({
  children,
}) => {
  const [conversations, setConversations] = useState<
    RemoteData<Conversation[]>
  >(getDefaultRemoteData([]));

  const { settings } = useSettings();

  useEffect(() => {
    (async () => {
      setRemoteDataLoading(setConversations, true);

      const resConvos = await databases.listDocuments(
        config.dbId,
        config.conversationCollectionId
      );

      const resUserConvos = await databases.listDocuments(
        config.dbId,
        config.userConversationCollectionId,
        [Query.equal("language", settings.language.locale)]
      );

      const conversations = resConvos.documents as Conversation[];
      const userConversations = resUserConvos.documents as UserConversation[];

      setConversations((prev) => ({
        ...prev,
        data: conversations.map((convo) => ({
          ...convo,
          userConversation: userConversations.find(
            (uc) => uc.conversationId === convo.$id
          ),
        })),
      }));

      setRemoteDataLoading(setConversations, false);
    })();
  }, [settings.language]);

  return (
    <DataContext.Provider
      value={{
        conversations: getRemoteDataWithSetter<Conversation[]>(
          conversations,
          setConversations
        ),
      }}
    >
      {children}
    </DataContext.Provider>
  );
};

// next js layout
function AppLayout({
  children,
}: Readonly<{
  children: React.ReactNode;
}>) {
  return (
    <DataContextProvider>
      <div className="h-screen flex overflow-hidden">
        <Sidebar />
        <div className="grow h-full">
          <ScrollArea className="h-full overflow-y-auto">{children}</ScrollArea>
        </div>
      </div>
    </DataContextProvider>
  );
}

AI (Gemini)

As previously said, I can call Gemini without exposing my keys using NextJS' edge functions. And that's exactly what I did. To get started using edge functions with Gemini, define a route.ts file under the app directory and add something like this:

export const maxDuration = 50; // This function can run for a maximum of 50 seconds

export async function POST(request: NextRequest) {
  const apiKey = String(process.env.GEMINI_API_KEY);

  const genAI = new GoogleGenerativeAI(apiKey);

  const geminiModel = genAI.getGenerativeModel({
    model: "gemini-1.5-pro",
  });

  const generationConfig = {
    temperature: 2,
    topP: 0.95,
    topK: 64,
    maxOutputTokens: 8192,
    responseMimeType: "text/plain",
  };
  try {
    const history: Content[] = [
      {
        role: "user",
        parts: [
          {
            text: `Hello world!`,
          },
        ],
      },
    ];

    // ask for response by Gemini
    const chatSession = geminiModel.startChat({
      generationConfig,
      history,
    });

    const result = await chatSession.sendMessage("Continue");

    return createSuccessResponse(result.response.text(), "Message(s) created");
  } catch (err) {
    console.log(err);
    let errorMsg = "Unkown Error";
    if (err instanceof GoogleGenerativeAIError)
      errorMsg = "Gemini encountered some error. Please try again.";
    else if (err instanceof Error) errorMsg = err.message;
    return createErrorResponse(errorMsg);
  }
}

A couple things to note are:

I call a private environment variable using process.env to get the Gemini API key.
Since Gemini might take a little while to generate content, I set max duration to allow for longer requests, as the default only allows request to be a couple of seconds before returning a timeout error.

Conclusion

Thanks for reading. Hopefully you learn a thing or two. And if you have any suggestions feel free to leave some. Peace!

Banter Bot: AI-Powered Language Learning Platform

Making an AI-Based Conversation Simulator with Appwrite, NextJS, and Google's Gemini AI

Table of contents