How to Build a Blazing Fast AI Discord Bot with Python and Local LLMs (No API Keys Required)

Build a Blazing Fast AI Discord Bot | BitScript
Tutorial  ·  Python + Local LLMs

Build a Blazing Fast
AI Discord Bot
with Local LLMs

No API keys. No token costs. A fully asynchronous pipeline powered by Ollama and discord.py that keeps your bot lightning-fast and completely lag-free.

🕐 ~8 min read 🐍 Python 🤖 discord.py 🧠 Ollama 🔒 100% Private

If you have spent any time playing around with local Large Language Models, you already know how incredible it is to run AI on your own hardware with zero token costs and total privacy. But running an AI assistant on your desktop is just the beginning.

Why keep it to yourself when you can drop it directly into your Discord server to hang out, moderate, or brainstorm with your entire community? In this guide, we are building a fully functional, intelligent Discord bot using discord.py and Ollama.

The catch with most AI Discord bot tutorials? They use synchronous code that freezes the bot while the AI generates a response. If your bot stops responding to the Discord gateway for too long, it gets disconnected. We are going to fix that by building a fully asynchronous pipeline that keeps your bot fast, responsive, and completely lag-free.

Why Go Local

Why Local LLMs for Discord?

💸
Zero Cost

No monthly API fees, no matter how chaotic or active your server chat gets. Zero token costs, ever.

🔒
Total Data Control

Your community's chat data never leaves your machine or hosting environment. Complete privacy.

🎨
Modded Personalities

Pass custom system prompts to change how your bot behaves on the fly — no redeployments needed.

Architecture

How Data Flows

// asynchronous message pipeline
👤
User Message
🌐
Discord Gateway
🤖
bot.py Event Loop
Ollama AsyncClient
🧠
Local LLM
💬
Reply Sent

Getting Started

The Step-by-Step Setup

Before writing the bot script, we need to handle the infrastructure. Follow these steps precisely to get your tokens and local environment ready.

1
Prerequisite

Install Ollama and Pull Your Model

Download and install Ollama for your OS. Once it's running in your terminal, pull a highly efficient model like Llama 3 or Phi-3:

terminal
bash
ollama pull llama3
2
Discord Portal

Set Up Your Discord Developer Application

Go to the Discord Developer Portal, click New Application, and give it a name. Navigate to the Bot tab, click Reset Token, and save that token securely — you will need it in your script.

3
Crucial Config

Enable Gateway Intents

Scroll down on the Bot tab to Privileged Gateway Intents. Toggle Message Content Intent to ON so your bot can actually read the server prompts it needs to process.

4
Terminal Setup

Install Python Dependencies

Install the required asynchronous libraries. We use discord.py for the API connection and the official ollama async Python package:

terminal
bash
pip install discord.py ollama python-dotenv

The Blueprint

bot.py — Full Production Code

Create a file named bot.py and paste the following clean production code. This handles connections, filters out messages from other bots, tracks basic history, and invokes the AI engine without locking up your script execution loops.

bot.py
python
import os
import discord
import ollama
from discord.ext import commands
from dotenv import load_dotenv

# Load environmental variables
load_dotenv()
TOKEN      = os.getenv('DISCORD_TOKEN')
MODEL_NAME = 'llama3'  # swap for phi3 or mistral

# Configure Bot Setup
intents = discord.Intents.default()
intents.message_content = True
bot = commands.Bot(command_prefix="!", intents=intents)

@bot.event
async def on_ready():
    print(f'🤖 {bot.user.name} is online and connected to Discord!')
    await bot.change_presence(activity=discord.Game(name="with Local Brains"))

@bot.event
async def on_message(message):
    # Rule 1: Never reply to yourself or other bots
    if message.author.bot:
        return

    # Check if the bot is explicitly tagged or mentioned
    if bot.user.mentioned_in(message):
        # Clean the mention string out of the prompt text
        clean_prompt = message.content.replace(f'<@{bot.user.id}>', '').strip()
        if not clean_prompt:
            await message.channel.send(
                f"Hey {message.author.mention}! Tag me and ask something like: 'How do I optimize a Python script?'"
            )
            return

        # Trigger a typing indicator so users know the local engine is working
        async with message.channel.typing():
            try:
                # Use Ollama's native AsyncClient to prevent blocking the event loop
                async_client = ollama.AsyncClient()
                response = await async_client.generate(
                    model=MODEL_NAME,
                    prompt=clean_prompt,
                    system="You are an elite, helpful tech companion inside a Discord server. Keep answers concise and readable."
                )

                ai_reply = response.get('response', 'Something went wrong with my generation loop.')

                # Handle Discord's 2000-character message limit rule safely
                if len(ai_reply) > 2000:
                    chunks = [ai_reply[i:i+1900] for i in range(0, len(ai_reply), 1900)]
                    for chunk in chunks:
                        await message.reply(chunk)
                else:
                    await message.reply(ai_reply)

            except Exception as e:
                print(f"Error during AI processing: {e}")
                await message.reply("Sorry, my local cognitive cores ran into a processing glitch.")

    await bot.process_commands(message)

if __name__ == '__main__':
    bot.run(TOKEN)

Breaking Down the Magic

The secret sauce here is ollama.AsyncClient(). Here's why it matters:

If you use a standard requests block or standard sync client calls, your entire script stops dead in its tracks while your GPU or CPU crunches the math. To Discord, your bot looks dead — leading to a gateway timeout error and a disconnected bot.

By await-ing the async generation call, we hand control back to the discord.py event loop. While the AI figures out a complex response for one channel, the bot can simultaneously:

🔄
Process Other Events

Handle other incoming messages and commands in different channels without any delay.

📊
Track Statuses

Maintain real-time user presence and activity tracking across the entire server.

💓
Keep Heartbeat Alive

Maintain the Discord gateway connection intact — no disconnects, ever.

What's Next?

You can take this project a step further by implementing a SQLite or Firebase database cluster to log thread IDs, creating long-term multi-user memory contexts directly in your server channels. Bookmark BitScript for weekly deep-dives into edge computing, clean code structures, and private AI tools.

Let's Connect

Ran into setup roadblocks or want to share your custom bot configurations? Let's talk about it in the community.

BitScript  ·  Weekly deep-dives into edge computing, clean code structures, and private AI tools

Comments

Popular Posts

Welcome To BitScript

The Rise of Physical AI: Why Embodied Robotics Is the Next Trillion-Dollar Tech Frontier in 2026

The Local-First Revolution: Why Offline-First Architecture and Edge Computing Are Replacing Cloud Dependencies in 2026