Setting Up My Personal AI Writing Assistant: A Local LLM Journey

In an era where AI assistance has become increasingly valuable for content creation and editing, I recently embarked on setting up my own local AI infrastructure. By running language models directly on my hardware, I’ve created a personalized writing assistant and copy editor that’s both private and accessible throughout my home network.

Installation Deep Dive

The setup process involved several key components, each with its own considerations:

Ollama Setup

First, I installed Ollama from their official website. The process was straightforward:
- Mac: A simple download and drag-to-Applications installation
- Windows: Running the provided installer, which handled everything including PATH setup
Model Installation:

ollama pull llama2:8b
ollama pull phi

The download sizes are pretty substantial - around 4GB for Llama2 and 2.7GB for Phi-4, so patience and storage are necessary. This is why I installed 6 total terabytes of storage on my pc!

For more information, see the Ollama docs.

OpenWebUI Integration

Installing Open WebUI required a few more steps:

docker pull ghcr.io/open-webui/open-webui:main
docker run -d -p 3000:8080 --name open-webui -v open-webui:/app/backend/data ghcr.io/open-webui/open-webui:main

After installation, connecting OpenWebUI to Ollama just required pointing it to the local Ollama API endpoint (typically http://localhost:11434). For more information, check out the Open WebUI docs.

Model Performance Insights

After extensive testing, I’ve noticed some interesting performance characteristics:

Llama2 8B

Strengths:

Excellent general writing assistance
Strong grammar correction capabilities
Good at maintaining context in longer conversations

Limitations:

Generation speed averages 15-20 tokens per second on my hardware
Occasionally struggles with complex technical topics
Memory context window can be limiting for very long documents

Microsoft Phi-4

Strengths:

Surprisingly capable for its size
Excellent at code-related tasks
Faster inference speed compared to Llama2

Limitations:

Sometimes less coherent on very long outputs
More prone to hallucinations on specific technical topics

Hardware Requirements and Performance Notes

My setup runs on the following PC configuration:

CPU: AMD Ryzen 7 9800X3D
GPU: Geforce RTX 4070 Ti Super
RAM: 32GB
Storage: NVMe SSD with 2TB capacity

and also runs on my Macbook Pro with the M2 Pro chip and 32GB of memory.

I’ve found this configuration handles both models comfortably, though having an NVMe drive and a dedicated GPU makes a noticeable difference in model loading times.

Challenges and Solutions

The journey wasn’t without its hurdles. Here are the main challenges I encountered and how I resolved them:

Memory Management

I haven’t had any issues running models with fewer than 7-9b parameters.

Network Configuration Challenges

Setting up network access required addressing:

Proper port forwarding on the local network
Security considerations for local API access
Docker network configuration for OpenWebUI

Performance Optimization

Several tweaks were necessary to get optimal performance:

Adjusting model quantization settings in Ollama
Finding the right balance between context window size and performance
Implementing proper caching mechanisms for frequently used prompts

Network Configuration

One of the most satisfying aspects of this setup was configuring my PC to act as a local AI server. By opening up access on my local network, I’ve created a personal AI infrastructure that’s accessible from any device in my house. This means I can:

Draft blog posts from my tablet while relaxing on the couch
Get quick edits done from my laptop in the home office
Access my AI assistant from any device without relying on cloud services

Benefits of Local LLMs

Running these models locally offers several advantages:

Complete privacy for my writing and editing process
No subscription fees or API costs
Consistent access without internet dependency
Full control over the model selection and configuration

Looking Forward

This setup has transformed my writing workflow, providing me with always-available AI assistance while maintaining privacy and control. As newer and more capable models become available for local deployment, I look forward to further enhancing this system.

For those interested in creating a similar setup, I encourage you to explore Ollama and OpenWebUI. While the initial configuration requires some technical comfort, the resulting capability to run powerful language models on your own hardware is well worth the effort.