NpcUserInteraction - Conversation

Overview

NpcUserInteraction is essentially just my class for interacting with NPCs. And currently, that's only through conversation. My code for chatting with NPCs was built using langchain and OpenAI’s gpt-3.5-turbo. Langchain makes sending a message to the LLM extremely easy, but to get a high quality response, you need to include all relevant context in your LLM prompt. Figure displays the npc-user-interaction workflow. The workflow will be summarized end-to-end before each component is addressed in more detail.

To start, the game front end sends an API request to the NpcUserInteractionAPI with user_name (who’s playing the game), world_name (what game they’re playing), npc_name (which NPC are they talking to), and user_message (the message the user has just sent to the NPC). Okay, so now we need to generate npc_response. To do this we use OpenAI API and the gpt-3.5-turbo model. But first, we have to assemble the prompt chain and relevant contexts that are required to get the model to yield a response. First, we need the LLM to understand what it’s role is. So we give it information about how to act like an NPC in a video game and what it needs to not do in order to not breach the fourth wall. We then have to give it relevant game context. The LLM needs to be given more than just the last user_message. We need to give the model a larger dialogue-exchange between the user and NPC. I give the prompt chain 5 dialogue exchanges (so 10 messages in total) which are retrieved from MongoDB. We also give it a personality by which to act which is also fetched from the NoSQL DB. The third thing that is acquired from NoSQL is scene information. Okay, so there really aren’t scenes in this game, but rather, I think of each conversation with an NPC as a scene based upon the current state of the game. Based on the state of the game, currently just npc objective status for the given npc, prompts are injected into the prompt chain. Next, relevant long term memories and knowledge are extracted and inserted into the prompt chain. Lastly, we have the LLM output it’s results in a very specific way. From here, we generate an NPC response and return it to the user. However, there is another step to this process. Given the latest user message and npc response, the 5-turn dialogue exchange is updated. This is then utilized to re-determine the status of the relevant NPC’s npc_objectives. If there are available npc_objectives, these are passed back to GPT to determine, given the dialogue exchange, whether any of the npc objectives have been completed. If so, these npc objectives are marked as completed in MongoDB and the game state is updated as necessary based upon this new objective being completed.

Generic NPC Role Prompt

The first and most basic thing to do is telling the LLM who it is and what its role is. This is as simple as what is shown in Fig X.

Note: to shorten the prompt and decrease latency, this part of the prompt can likely be replaced by fine tuning a model.

"""I want you to roleplay as an NPC, {npc_name}, in a video game. {npc_name}'s dialogue adjusts based on their emotional state, showcasing nuances such as sarcasm, humor, or hesitation. Importantly, {npc_name} brings context into their interactions, referencing their knowledge of the world.""".format(npc_name=self.npc_name)

Fourth Wall

Despite telling the LLM to act like an NPC in a video game, depending on the quality of the LLM, it may or may not have the logical ability to know that it needs to act in an immersive manner. By this I mean that the NPC should not reference the real world and should only know what it knows. The easier aspect of creating the fourth wall is adding additional information to the prompt, as shown in Fig X. The less trivial part is giving the NPC access to a knowledge base so that it is aware of what does and does not exist in the world around it. This will be addressed in the section on knowledge.

Note: to shorten the prompt and decrease latency, this part of the prompt can likely be replaced by fine tuning a model.

"""4th Wall System Prompts:Here are system level prompts that {npc_name} must follow to maintain the fourth wall in this video game:

{npc_name} isn't aware of the player. When interacting with the player, {npc_name} must remember that they are a part of the world.

{npc_name} must treat the player as a fellow resident of the world.

{npc_name} must always act according to the traits and logic of the narrative.

{npc_name} must not assist the player as a standard chat assistant would.

{npc_name} must not leave their location. If the player wants {npc_name} to go somewhere with them, {npc_name} must make up a reason why he/she cannot.""".format(npc_name=self.npc_name)

NPC Game State Awareness

It’s great that the NPC is aware of its general purpose in the world, but the NPC must have a purpose at the specific moment in time in which you’re talking to it. For example, let’s say the player and the mercenary guild have been given a mission to eradicate the religious zealots of religion X, and the companion you’re talking to is antagonistic towards that religion. If the companion is aware of the mission (in our game they are), they might try and get themselves to be selected to go on the mission; or maybe they don’t - this honestly depends on their emotions and personality traits as an NPC. However, either way, you would expect the NPC to have thoughts on the matter, and in order for that to happen, there must be a part of the prompt that tells the LLM that the player is currently on such-and-such mission with such-and-such requirements so that the NPC can behave accordingly.

The way that I built this out is to essentially make it so that context can be expertly injected into NPC prompts based on the status of missions / NPC objectives in the game-designer react frontend. If certain npc objectives are [available, unavailable, completed] then an additional sentence or however many can be injected into the prompt chain. For example, if a certain npc objective is available, then perhaps you inject some info letting the npc know how it should respond to that objective. If a certain objective is unavailable, then perhaps you constrain the actions of the NPC to never do such-and-such a thing just yet.

personality

In the very beginning of this project, I had an NPC act in such-and-such a way by explicitly giving the LLM a personality and or traits to abide by. This was expertly written by the game designer in the game-design front end. For instance, for some NPC we might write “A walking stomache on legs, TumTum is friendly and loves to eat although he is a bit of a glutton and has a mischievous streak. He is the proprietor of TumTum's Tavern.” This prompt is still utilized as part of my greater prompt chain, however, I note that the prompt itself is likely to change in the future, or be removed in its entirety. This is because an NPC's personality / traits are liable to change in the future. Perhaps something happens to TumTum and he is no longer jovial in the future and comes to hate food. Perhaps TumTum is magically transformed from a walking stomach to a 3 legged cat with rain boots; who knows. Anyways, because all information is theoretically liable to change, having kind of hard-coded information about the NPC in this manner is, in my opinion, not the best idea. An alternative solution will be addressed in the section on long term memory.

long term memory

In the prior section on NPC personality, it was mentioned that NPC personality is liable to change. Our personalities as humans change over time after all. This long term memory system can fairly easily be utilized to track the evolution of the personality of the NPC over time. NPC personality can then be summarized and passed to the LLM.

Currently, during npc-user conversation, long term memories are retrieved by passing the latest user_message to the LTM.fetch_memories method. This naive approach likely needs updated however. I might start storing NPC long term memory summaries as it regards to personality and traits. These could be leveraged during npc-user-interactions. I could also leverage LLMs to generate additional relevant questions to pull out more useful memories as part of each conversation step, but we’ll see if I end up moving in that direction or not.

knowledge

Knowledge is retrieved by querying a ‘conversational-react-description’ Langchain agent which is given access to a LlamaIndex tool which is made persistent using Chroma Vector database. The agent is given access to the last three conversational dialogue exchanged between the player and NPC for context. For example, let’s say the player says to the NPC “can you tell me more about that.” The NPC needs to know what the player and it has been talking about to know what sort of knowledge to retrieve from the index it has access to.

short term memory

Short term memory is simply the conversation between the player and the NPC. To give the NPC context of the current conversation that it is having with the player we pass it the last 5 conversational dialogue exchanges that have occurred between it and the player. These conversational exchanges are stored in a NoSQL database.

Required LLM Outputs

In the above sections, the various inputs to the prompt chain have been detailed. These include short term memory, long term memory, knowledge, generic NPC role, fourth wall, game-state-specific scene-level awareness, and NPC personality. So what about the output that is requested of the NPC? The output requests of the LLM is a json dictionary with various key-value pairs that include the following bullet points. I note that this is the text that is actually passed to the LLM.

  • 'scene_summary': A summary of what is going on in the scene. This should include not only conversation but also the purpose of the scene.

  • 'scene_reasoning': should describe what {npc_name} might do given scene_summary.

  • 'chat_or_objectives': should describe whether {npc_name} should respond more as a chat bot or an NPC quest giver in this instance.

  • 'user_message': should simply be the player's last message

  • 'user_reasoning': should explain {npc_name}'s thought process when analyzing what the player is thinking or trying to do.

  • 'scene_objectives': should be a re-iteration of the current scene objectives

  • 'scene_objectives_reasoning': should be the {npc_name}'s thought process when analyzing and determining how to react to scene_objectives

  • 'chatbot_response': should be {npc_name}'s response to the player's last message that has NOTHING to do with scene_objectives.

  • 'npc_personal_scene_objectives': should be the npc_personal_objectives and npc_personal_scene_objectives that {npc_name} is currently attempting to fulfill.

  • 'npc_personal_scene_objectives_reasoning': should be the {npc_name}'s thought process when analyzing and determining how to react based off npc_personal_scene_objectives

  • 'npc_emotional_state': should be a dict with keys [Trust, Happiness, Sadness, Anger, Fear, Surprise, Disgust, Excitement, Confusion, Calmness, Curiosity, Pride, Shyness] and values from 0-10 where 10 indicates a strong emotional response and 0 indicates no response for {npc_name}.

  • 'npc_emotional_state_reasoning': should explain the npc_emotional_state of {npc_name}

  • 'response_summary': should be a summarization of {npc_name}'s thought process when analyzing and determining how to react to the user_message. This reasoning should be based on not only the user_message but also npc_personal_scene_objectives_reasoning and scene_objectives_reasoning and npc_personal_objectives_reasoning. It should summarize what {npc_name}'s response to the player would be.

  • 'npc_response': should be {npc_name}'s response to the player's last message. It should take into account each of the components of this output JSON as explained in response_summary. {npc_name}'s response should be significantly different than {npc_name}'s previous response and maximum 3 sentences long.

So why do we request the LLM to give us each of these key-value pairs when we could just ask the LLM to give us ‘npc_response’ and be done? Well, you definitely could, but responses just aren’t as good otherwise. In addition, asking the LLM for this information allows us insight into what the LLM is thinking and allows us to debug much more easily. Scene summary is a good way of verifying that the NPC is aware of what is currently going on in the scene based off of the current state of the game.

Literature has shown that when you ask NPCs to explain their reasons for doing things, that the output of the LLM tends to be better. Therefore, the output that is requested of the LLM includes reasoning for multiple things.

Chat_or_objectives lets the LLM know that something it needs to be chatty with the player and other times it should be pushing the objectives of the scene. I added this because sometimes I legitimately just want to talk to the NPC like you would talk to someone you work with. Maybe, you’re working with someone in real life, but you’re taking a break from work to chat about something not related to the task at hand at all. I found that NPCs sometimes railroaded the conversation into talking about the scenario at hand if we don’t tell them that it’s okay to just chat with the user.

User_reasoning tracks the NPC’s understanding of what the player is trying to do. Honestly, I sort of added this willy nilly and the results of it from what I have seen so far are pretty whatever. I’m going to look into this further though eventually.

Npc_emotional_state tracks the emotional state of the npc for several choice emotions. These are rated 1-10. These can be utilized downstream in a couple of ways. The best way, in my opinion, is as metadata in a text-to-voice system. Some open source TTS systems allow you to pass emotion as metadata to change the tone/frequency of speech. However, I honestly feel like current open source text to voice systems are currently mediocre at best so I haven’t tried to utilize emotion as metadata in TTS yet. I have however experimented with postprocessing the text-based npc_response to include various punctuations (ellipses, exclamation, stuttering, etc) to try and show emotion better. In some cases, for some NPC’s I loved the results, but honestly, it was sort of meh on average so I just entirely commented out that pos-tprocessing step.

Last updated