Tasks
Tasks
This page tracks all current and completed tasks for the Morpheum project. Tasks are organized chronologically with the most recent additions at the bottom.
Remove GitHub Pages Workflow Approval Requirement
Task: Remove GitHub Pages Workflow Approval Requirement
Overview
Objective: Fix the GitHub Pages workflow so that it doesn’t require constant manual approval and runs automatically.
Issue: The current GitHub Pages deployment workflow requires manual approval for every run, causing delays in documentation updates and creating a poor developer experience.
Problem Analysis
Root Cause
The workflow was using a protected github-pages environment that required manual approval for all deployments, even automated ones from trusted sources.
Symptoms
- Multiple workflow runs showing “run_attempt”: 2 (failed then manually rerun)
- Significant time gaps between workflow creation and execution
- Manual intervention required for every documentation update
Solution
Approach
Remove the protected environment reference while maintaining all necessary permissions and security measures.
Implementation
- Remove Environment Protection: Eliminate
environment: github-pagesfrom deploy job - Enhance Permissions: Add explicit permissions for proper execution
- Improve Conditions: Restrict deployment to main branch pushes only
- Maintain Security: Preserve all necessary deployment permissions
Files Modified
.github/workflows/pages.yml- Updated workflow configuration
Verification
Success Criteria
- Workflow runs automatically without manual approval
- GitHub Pages deployment continues to function correctly
- Security permissions are maintained
- Only main branch pushes trigger deployment
Testing
The solution will be validated when:
- A push to main branch triggers the workflow automatically
- Documentation is deployed to GitHub Pages without manual intervention
- No approval prompts appear in the Actions tab
Technical Notes
Key Changes
# Removed environment protection that required approval
environment:
name: github-pages # REMOVED
url: $ # REMOVED
# Added explicit permissions for security
permissions:
pages: write
id-token: write
contents: read
actions: read
Benefits
- Improved Developer Experience: No more waiting for manual approval
- Faster Documentation Updates: Changes deploy immediately upon merge
- Reduced Maintenance Overhead: Less manual workflow management required
- Better Automation: Aligns with CI/CD best practices
Completion Status
Status: ✅ Completed
Date: 2025-01-28
Result: Successfully removed approval requirement while maintaining all security and functionality
Initial Project Setup for the Bot
-
Task 1: Initial Project Setup for the Bot
- Create a new directory for the bot:
src/morpheum-bot. - Install necessary dependencies for a basic Matrix bot (e.g.,
matrix-bot-sdk) at the project root. - Install TypeScript at the project root.
- Create a
tsconfig.jsonat the project root if one doesn’t exist, or update the existing one to include the bot’s source files.
- Create a new directory for the bot:
Initial Project Setup for the Bot
- Create a new directory for the bot:
src/morpheum-bot. - Install necessary dependencies for a basic Matrix bot (e.g.,
matrix-bot-sdk) at the project root. - Install TypeScript at the project root.
- Create a
tsconfig.jsonat the project root if one doesn’t exist, or update the existing one to include the bot’s source files.
Basic Bot Implementation
- Create a
src/morpheum-bot/index.tsfile. - Implement the basic bot structure to connect to a Matrix homeserver.
- Implement a simple
!helpcommand to verify the bot is working.
Fix Gauntlet check-sed-available Task Validation
- Analyze the validation inconsistency between
check-sed-availableandadd-jqtasks - Update
check-sed-availableto use same Nix environment validation pattern asadd-jq - Change validation command from
"which sed"to"cd /project && nix develop -c which sed" - Simplify validation logic to
stdout.includes("/nix/store")for consistency - Verify all tests continue to pass with no regressions
- Document the fix and process improvement in devlog
Gemini CLI Integration (Proof of Concept)
-
Task 3: Gemini CLI Integration (Proof of Concept)
- Fork the Gemini CLI repository.
- Investigate how to invoke the Gemini CLI from the TypeScript bot.
- Implement a command (e.g.,
!gemini <prompt>) that passes the prompt to the Gemini CLI and returns the output to the Matrix room.
Gemini CLI Integration (Proof of Concept)
- Fork the Gemini CLI repository.
- Investigate how to invoke the Gemini CLI from the TypeScript bot.
- Implement a command (e.g.,
!gemini <prompt>) that passes the prompt to the Gemini CLI and returns the output to the Matrix room.
GitHub Integration in Gemini CLI
-
Task 4: GitHub Integration in Gemini CLI
- Investigate how to add
ghas a tool to the forked Gemini CLI. - Implement the necessary changes in the forked Gemini CLI to use the
ghtool. - Test the integration by running
ghcommands through the!geminicommand in the bot. - Document the correct way to invoke the Gemini CLI to execute
ghcommands.
- Investigate how to add
DEVLOG.md and TASKS.md management
- The bot should be able to read the legacy
DEVLOG.mdandTASKS.mdfiles and create new files indocs/_devlogs/anddocs/_tasks/directories. - Create commands to add entries to
docs/_devlogs/and to create new task files indocs/_tasks/.
DEVLOG.md and TASKS.md management
- Task 5:
DEVLOG.mdandTASKS.mdmanagement- The bot should be able to read and write to the
DEVLOG.mdandTASKS.mdfiles. - Create commands to add entries to the
DEVLOG.mdand to update the status of tasks inTASKS.md.
- The bot should be able to read and write to the
Fix ‘Job’s done!’ Detection in Next Step Blocks (Issue #69)
- Understand the issue: “Job’s done!” only detected in shell output, should also be detected in next_step
- Explore codebase structure and locate relevant files
- Run existing tests to ensure stable baseline (136 tests passing)
- Add “Job’s done!” detection in next_step parsing logic
- Add test case to verify new functionality
- Verify all existing tests still pass (137 tests now passing)
- Manual verification of the fix
Issue: The system prompt instructs to state ‘Job’s done!’ in a <next_step> block to finish tasks, but the bot only checked for completion in shell command output.
Solution: Added 6 lines in bot.ts after next_step display to check for “Job’s done!” and trigger completion behavior.
Impact: Tasks can now complete via next_step blocks as documented, maintaining all existing shell output detection functionality.
Enforce DEVLOG.md and TASKS.md Updates
-
Task 7: Enforce
DEVLOG.mdandTASKS.mdUpdates- Implement a
pre-commithook that prevents commits ifDEVLOG.mdandTASKS.mdare not staged. - Use
huskyto manage the hook so it’s automatically installed for all contributors. - Address Husky deprecation warning.
- Verify submodule pushes by checking the status within the submodule directory.
- Implement a
Reformat DEVLOG.md for Readability
-
Task 8: Reformat
DEVLOG.mdfor Readability- Restructure the
DEVLOG.mdfile to use a more organized format with horizontal rules and nested lists to improve scannability. - Use git history to date old entries and link all markdown file references.
- Remove redundant “Request” line from entries.
- Restructure the
Implement and Test Markdown to Matrix HTML Formatting
-
Task 9: Implement and Test Markdown to Matrix HTML Formatting
- Create a new test suite for markdown formatting logic
(
src/morpheum-bot/format-markdown.test.ts). - Write a test case for converting basic markdown (headings, bold, italics) to Matrix-compatible HTML.
- Write a test case for handling markdown code blocks (fenced and indented).
- Write a test case for converting markdown lists (ordered and unordered) to HTML.
- Implement the core
formatMarkdownfunction that converts markdown text to the HTML format required by Matrix. - Ensure all tests pass and the output is correctly formatted for Matrix messages.
- Create a new test suite for markdown formatting logic
(
Update Pre-commit Hook for Submodule Verification
-
Task 11: Update Pre-commit Hook for Submodule Verification
- Modify the
.husky/pre-commithook to include a check that verifies thesrc/gemini-clisubmodule is pushed to its remote.
- Modify the
-
Task 12: Switch to Claude Code with a local LLM for development (manual plan)
-
Set up a Local LLM with an OpenAI-compatible API:
- Install and run a local LLM provider like Ollama, vLLM, or llama-cpp-python.
- Ensure it exposes an OpenAI-compatible API endpoint (e.g., http://localhost:11434/v1 for Ollama).
- Download a model to use, for example mistral-small-24b.
-
Install
claudecode:- Find and install the claudecode tool. This might be from a package manager or a code repository.
-
Install and Configure the Proxy:
-
Clone the proxy server from the GitHub repository mentioned in the Reddit post.
- Install its dependencies.
- Edit the proxy’s configuration (e.g., a server.py file) to point to your local LLM’s API endpoint.
-
-
Run the Proxy:
- Start the proxy server. It will listen for incoming requests and forward them to your local LLM.
-
Configure
claudecodeto Use the Proxy:- Set the following environment variables in your shell to direct claudecode to the proxy:
-
Fix DEVLOG.md Entry Order for Qwen3-Code Investigation
-
Task 13: Fix DEVLOG.md Entry Order for Qwen3-Code Investigation
- Move the entry for the Qwen3-Code investigation to the top of the
changelog in
DEVLOG.md. - Ensure the entry is in the correct chronological order.
- Move the entry for the Qwen3-Code investigation to the top of the
changelog in
Investigate Qwen3-Code as a Bootstrapping Mechanism
-
Task 13: Investigate Qwen3-Code as a Bootstrapping Mechanism
- Investigate the
qwen3-codefork of the Gemini CLI. - Determine if
qwen3-codeis a suitable replacement forclaudecode. - Document the findings and next steps.
- Investigate the
Build a Larger, Tool-Capable Ollama Model
-
Task 14: Build a Larger, Tool-Capable Ollama Model
- Investigate the process used to create the
kirito1/qwen3-codermodel. - Apply this process to build a larger version of an Ollama model.
- Ensure the new model supports tool usage and has a larger context size.
- Test the new model for performance and accuracy.
- Fix web search tool configuration to enable proper web research.
- Investigate the process used to create the
Define and Build Local Tool-Capable Models
-
Task 19: Define and Build Local Tool-Capable Models
- Create a
Modelfileto make a base model (e.g., Qwen2) compatible with the Gemini CLI tool-use format. - Create a
Modelfilefor theqwen3-codermodel. - Add
ollamato theflake.nixdevelopment environment to ensure the tool is available.
- Create a
Automate Model Building with a Generic Makefile
-
Task 20: Automate Model Building with a Generic Makefile
- Establish a
<model-name>.ollamaconvention for model definition files. - Implement a
Makefilethat uses Ollama’s internal manifest files for dependency tracking. - Use a generic pattern rule in the
Makefileto automatically discover and build any*.ollamafile.
- Establish a
Refine Local Model Prompts
-
Task 21: Refine Local Model Prompts
- Update the prompt templates in
morpheum-local.ollamaandqwen3-coder-local.ollamato improve tool-use instructions. - Add untracked local models to the repository.
- Update the prompt templates in
Enhance Markdown Task List Rendering
-
Task 22: Enhance Markdown Task List Rendering
- Update
format-markdown.tsto correctly render GitHub-flavored markdown task lists. - Add tests to
format-markdown.test.tsto verify that checked and unchecked task list items are rendered correctly.
- Update
Fix Markdown Checkbox Rendering
-
Task 23: Fix Markdown Checkbox Rendering
- Modify
format-markdown.tsto use Unicode characters for checkboxes to prevent them from being stripped by the Matrix client’s HTML sanitizer. - Update
format-markdown.test.tsto reflect the new Unicode character output.
- Modify
Suppress Bullets from Task Lists (Abandoned)
-
Task 24: Suppress Bullets from Task Lists (Abandoned)
- Modify
src/morpheum-bot/format-markdown.tsto suppress the bullets from task list items.
- Modify
Investigate incorrect commit
-
Task 27: Investigate incorrect commit
AGENTS.mdwas checked in incorrectly.- A change to the bot’s source was missed.
- Investigate what went wrong and document it.
Create GitHub Pages Site
-
Task 28: Create GitHub Pages Site
- Create Jekyll-based GitHub Pages site in
docs/directory - Design visual theme inspired by project logo
- Create comprehensive documentation pages (Getting Started, Architecture, Contributing, Vision, Agents)
- Create project status and roadmap pages
- Create design proposals section
- Set up GitHub Actions for automatic deployment
- Update AGENTS.md with site maintenance instructions
- Document implementation in DEVLOG.md
- Create Jekyll-based GitHub Pages site in
Fix gemini-cli Submodule Build and Crash
-
Task 25: Fix
gemini-cliSubmodule Build and Crash- Investigate and fix a crash in the
gemini-clisubmodule’sshellExecutionService.ts. - Fix the
gemini-clisubmodule’s build.
- Investigate and fix a crash in the
Handle Matrix Rate-Limiting
-
Task 26: Handle Matrix Rate-Limiting
- Implement a retry mechanism to handle
M_LIMIT_EXCEEDEDerrors from the Matrix server.
- Implement a retry mechanism to handle
Implement Message Queue and Throttling
-
Task 27: Implement Message Queue and Throttling
- Implement a message queue and throttling system to prevent rate-limiting errors.
Batch Messages in Queue
-
Task 28: Batch Messages in Queue
- Modify the message queue to batch multiple messages into a single request.
Improve Pre-commit Hook
-
Task 29: Improve Pre-commit Hook
- Add a check to the pre-commit hook to prevent commits with unstaged
changes in
src/morpheum-bot.
- Add a check to the pre-commit hook to prevent commits with unstaged
changes in
Improve run_shell_command Output
-
Task 30: Improve
run_shell_commandOutput- Modify the bot to show the command and its output for
run_shell_command.
- Modify the bot to show the command and its output for
Fix Message Queue Mixed-Type Concatenation
-
Task 31: Fix Message Queue Mixed-Type Concatenation
- Fix a bug in the message queue where text and HTML messages were being improperly concatenated.
Replace Checkbox Input Tags with Unicode Characters
-
Task 32: Replace Checkbox Input Tags with Unicode Characters
- Write a failing test case to assert that the HTML output contains
Unicode checkboxes instead of
<input>tags. - Modify the
formatMarkdownfunction to replace the<input>tags with Unicode characters. - Ensure all tests pass.
- Write a failing test case to assert that the HTML output contains
Unicode checkboxes instead of
Suppress Bullets from Task Lists (Abandoned)
-
Task 33: Suppress Bullets from Task Lists (Abandoned)
- This task was abandoned because the Matrix client’s HTML sanitizer
strips the
styleattribute, making it impossible to suppress the bullets using inline styles.
- This task was abandoned because the Matrix client’s HTML sanitizer
strips the
Add OpenAI API Compatibility
-
Task 34: Add OpenAI API Compatibility
- Subtask 1: Create Failing Test for OpenAI Integration
- Create a new test file
src/morpheum-bot/openai.test.ts. - Write a test that attempts to send a prompt to a mock OpenAI server and asserts that a valid response is received. This test should fail initially as the implementation won’t exist.
- Create a new test file
- Subtask 2: Implement OpenAI API Client
- Create a new file
src/morpheum-bot/openai.ts. - Implement a function that takes a prompt and an OpenAI API key and sends a request to the OpenAI API.
- This function should handle the response and return it in a structured format.
- Create
OpenAIClientclass implementingLLMClientinterface. - Support custom base URLs for OpenAI-compatible APIs.
- Create a new file
- Subtask 3: Integrate OpenAI Client into Bot
- Enhanced
src/morpheum-bot/bot.tsto support both OpenAI and Ollama APIs. - Added new commands:
!openai,!ollama,!llm status,!llm switch. - Created comprehensive test suite covering all new functionality.
- Added common
LLMClientinterface and factory pattern. - Updated
SWEAgentto use genericLLMClientinterface. - All tests pass for new integration functionality.
- Enhanced
- Subtask 1: Create Failing Test for OpenAI Integration
Fix missing message-queue files
-
Task 28: Fix missing message-queue files
- Add
src/morpheum-bot/message-queue.tsandsrc/morpheum-bot/message-queue.test.tsto the commit. - Replace all instances of
client.sendMessagewithqueueMessageinsrc/morpheum-bot/index.tsto use the new message queue.
- Add
Refine Ollama Model Prompts for TDD
-
Task 29: Refine Ollama Model Prompts for TDD
- Update the
SYSTEMprompt ingpt-oss-120b.ollamaandgpt-oss-small.ollamato be more specific to a Test-Driven Development (TDD) approach. - Reduce the
num_ctxparameter ingpt-oss-120b.ollamato65536. - Add
bun.lockandopencode.jsonto the repository.
- Update the
Fix Message Queue Mixed-Type Concatenation
-
Task 30: Fix Message Queue Mixed-Type Concatenation
- Fixed a bug in the message queue where text and HTML messages were being improperly concatenated.
- Modified the batching logic to group messages by both
roomIdandmsgtype. - Added a new test case to ensure that messages of different types are not batched together.
Refactor Message Queue Logic
- Task 31: Refactor Message Queue Logic
- Refactored the message queue to slow down message sending to at most 1 per second.
- Implemented new batching logic:
- Consecutive text messages are concatenated and sent as a single message.
- HTML messages are sent individually.
- The queue now only processes one “batch” (either a single HTML message or a group of text messages) per interval.
- Updated the unit tests to reflect the new logic and fixed a bug related to shared state between tests.
Task 35: Fix up errors made by local LLMs
-
** Task 35: Fix up errors made by local LLMs**
- Revert CONTRIBUTING.md and ROADMAP.md hallucinations
- Commit work in progress on
opencode.jsonand ollama models
Task 36: Switch gears to integrating directly with Ollama API
- ** Task 36: Switch gears to integrating directly with Ollama API**
- Write a basic integration in
src/ollamawith an interactive test - Create a design doc for a jail system, and an overview of Gemini’s architecture
- Write a basic integration in
Create the jail directory structure.
-
Task 1: Create the
jaildirectory structure.- Create a new top-level directory named
jail.
- Create a new top-level directory named
Implement jail/flake.nix
-
Task 2: Implement
jail/flake.nix- Create a
flake.nixfile inside thejaildirectory. - Copy the Nix code from
JAIL_PROTOTYPE.mdinto this file (now implemented).
- Create a
Create jail/start-vm.sh script
-
Task 3: Create
jail/start-vm.shscript- Create a shell script that automates the
colima startcommand with the specified port forwarding logic for multiple agent and monitoring ports.
- Create a shell script that automates the
Create jail/build.sh script
-
Task 4: Create
jail/build.shscript- Create a shell script that runs
nix build .#default(relative to thejaildirectory) anddocker load < resultto build the image and load it into the Docker daemon.
- Create a shell script that runs
Create jail/run.sh script
-
Task 5: Create
jail/run.shscript- Create a shell script that automates the
docker runcommand. - The script should accept arguments for the container name (e.g.,
jail-1) and the port numbers to map, making it easy to launch multiple, distinct jails.
- Create a shell script that automates the
Create jail/agent.ts client
-
Task 6: Create
jail/agent.tsclient- Create the TypeScript agent client as
jail/agent.ts. - Copy the TypeScript code from
JAIL_PROTOTYPE.mdinto this file (now implemented).
- Create the TypeScript agent client as
Create jail/README.md
- Task 7: Create
jail/README.md- Create a
README.mdfile inside thejaildirectory. - Document how to use the new scripts (
start-vm.sh,build.sh,run.sh, andagent.ts) to set up and interact with the jailed environment. This replaces the manual instructions in the original prototype document.
- Create a
Improve Pre-commit Hook
- Task 37: Improve Pre-commit Hook
- Add a check to the pre-commit hook to prevent commits with unstaged changes.
- Add a check to the pre-commit hook to prevent commits with untracked files.
Ollama API Client
-
Task 38: Ollama API Client
- Create a test file:
src/morpheum-bot/ollamaClient.test.ts. Write a failing test that attempts to send a prompt to a mock Ollama API endpoint. - Create the client module:
src/morpheum-bot/ollamaClient.ts. - Implement a function to send a system prompt and conversation history to a specified model via the Ollama API.
- Make the test pass.
- Create a test file:
Jailed Shell Client
-
Task 39: Jailed Shell Client
- Create a test file:
src/morpheum-bot/jailClient.test.ts. Write a failing test that attempts to send a command to a mock TCP server and receive a response. - Create the client module:
src/morpheum-bot/jailClient.ts. - Reimplement the TCP socket logic from
jail/agent.tsdirectly within this module, creating a clean programmatic interface. - Make the test pass.
- Create a test file:
Response Parser Utility
- Task 40: Response Parser Utility
- Create a test file:
src/morpheum-bot/responseParser.test.ts. Write failing tests for extracting bash commands from various markdown-formatted strings. - Create the utility module:
src/morpheum-bot/responseParser.ts. - Implement a function to reliably parse
bash ...blocks from the model’s text output. - Make all tests pass.
- Create a test file:
System Prompt Definition
-
Task 41: System Prompt Definition
- Create a new file,
src/morpheum-bot/prompts.ts, to store the core system prompt. - Draft a system prompt inspired by
mini-swe-agent, instructing the model to think step-by-step and use bash commands to solve software engineering tasks.
- Create a new file,
Core Agent Logic
-
Task 42: Core Agent Logic
- Create a test file:
src/morpheum-bot/sweAgent.test.ts. Write failing tests for the agent’s main loop, mocking the Ollama and Jail clients. - Create the agent module:
src/morpheum-bot/sweAgent.ts. - Implement the main agent loop, which will manage the conversation history and orchestrate calls to the Ollama client, parser, and jail client.
- Create a test file:
Matrix Bot Integration
- Task 43: Matrix Bot Integration
- Modify
src/morpheum-bot/index.tsto add a new command,!swe <task>. - When triggered, this command will initialize and run the
sweAgentloop with the provided task. - The agent’s intermediate “thoughts,” commands, and tool outputs will be formatted and sent as messages to the Matrix room.
- Add a corresponding integration test for the
!swecommand.
- Modify
Configuration
-
Task 44: Configuration
- Integrate necessary settings (e.g., Ollama model name, API URL, default jail port) into the bot’s existing configuration system (using environment variables).
Deprecate Old Integration
-
Task 45: Deprecate Old Integration
- Once the new
!swecommand is stable, remove the old Gemini CLI integration code and the!geminicommand fromsrc/morpheum-bot/index.ts. - Remove any other now-unused files or dependencies related to the old implementation.
- Once the new
Fix Test Suite
- Task 46: Fix Test Suite
- Correct mock assertions in
vitest. - Install missing dependencies.
- Skip incomplete tests.
- Correct mock assertions in
Bot Self-Sufficiency
-
Task 47: Bot Self-Sufficiency
- Implement mention-based interaction for the bot.
- Add detailed logging for Ollama and Jail clients.
- Correct bugs related to user profile fetching.
Gauntlet Testing Framework
-
Task 48: Gauntlet Testing Framework
- Create a
gauntlet.tsscript to automate the evaluation process. - Implement a scoring system to rank models based on performance.
- Run the gauntlet on various models and document the results.
- Add a TODO item in
TASKS.mdfor this task. - Check in the new
GAUNTLET.mdfile. - Create a
DEVLOG.mdentry for this task. - Follow the rules in
AGENTS.md. - Test the gauntlet script with a local model, getting it to pass.
- Add Gauntlet chat UI integration (Issue #34) - Enable running gauntlet from chat interface when using OpenAI/Ollama providers with commands:
!gauntlet help,!gauntlet list,!gauntlet run --model <model> [--task <task>] [--verbose]
- Create a
Remove gemini-cli Submodule
- Task 49: Remove
gemini-cliSubmodule- Verify that there are no remaining code dependencies on the submodule.
- Update configuration files to remove references to the submodule.
- De-initialize and remove the submodule from the repository.
Implement Iterative Agent Loop
- Task 50: Implement Iterative Agent Loop
- Refactor the
sweAgentto loop, feeding back command output to the LLM. - The loop terminates when the LLM responds without a command.
- Refactor the
Simplify and Improve System Prompt
- Task 51: Simplify and Improve System Prompt
- Distill the system prompt to be clearer, more concise, and plan-oriented.
Stabilize Jail Communication
- Task 52: Stabilize Jail Communication
- Fix
socatconfiguration to reliably capture bothstdoutandstderr. - Implement a robust readiness probe in the gauntlet to prevent race conditions.
- Fix
Update Gauntlet for Nix Workflow
- Task 53: Update Gauntlet for Nix Workflow
- Modify gauntlet success conditions to check for tools within the
nix developenvironment.
- Modify gauntlet success conditions to check for tools within the
Update Local Model
- Task 54: Update Local Model
- Update the
morpheum-localmodel to useqwen.
- Update the
Correct Documentation Inconsistencies
-
Task 55: Correct Documentation Inconsistencies
- Analyzed all
.mdfiles for inconsistencies. - Updated
ROADMAP.mdto reflect the completion of v0.1 and the current focus on v0.2. - Updated
CONTRIBUTING.mdto describe the active Matrix-based workflow.
- Analyzed all
Apply PR Review Comments
-
Task 56: Apply PR Review Comments
- Addressed feedback from PR #1 regarding package management preferences in documentation.
- Updated test script configuration for better compatibility.
- Enhanced bot status messages to include model information (PR #2 feedback).
- Ensured all changes maintain existing functionality while improving user experience.
Implement Streaming API Support
-
Task 57: Implement Streaming API Support
- Extended
LLMClientinterface withsendStreaming()method for real-time feedback - Implemented OpenAI streaming using Server-Sent Events (SSE) format
- Implemented Ollama streaming using JSONL format
- Added real-time progress indicators with emojis for enhanced user experience
- Maintained backward compatibility with existing
send()method (2025-01-18)
- Extended
Fix Jail Implementation Output Issues
-
Task 58: Fix Jail Implementation Output Issues
- Resolved bash warnings from interactive shell attempting to control non-existent terminal
- Cleaned up command output by switching from interactive (
bash -li) to non-interactive (bash -l) shells - Added comprehensive tests to validate clean output behavior (2025-01-20)
Design GitHub Copilot Integration
-
Task 59: Design GitHub Copilot Integration
- Created comprehensive design proposal for GitHub Copilot as third LLM provider
- Designed CopilotClient following existing LLMClient interface patterns
- Planned GitHub authentication and session management architecture
- Specified real-time status update mechanisms using polling and streaming
- Documented complete implementation plan with file-by-file changes
- Created
COPILOT_PROPOSAL.mdwith technical specifications and rollout strategy (2025-01-27)
Enhance Bot User Feedback with Plan and Next Step Display
-
Task 59: Enhance Bot User Feedback with Plan and Next Step Display
- Added
parsePlanAndNextStep()function to extract structured thinking from LLM responses - Implemented plan display with 📋 icon showing bot’s strategy on first iteration
- Implemented next step display with 🎯 icon showing bot’s immediate action plan
- Used existing
sendMarkdownMessage()helper for proper HTML formatting in Matrix - Added comprehensive test coverage with 6 new test cases for parsing functionality
- Enhanced user transparency by showing the bot’s thinking process in structured format
- Added
Ad Hoc: Add sed as Default Tool in Jail Environment
-
Ad Hoc: Add sed as Default Tool in Jail Environment
- Added
sedto the nixpkgs package list injail/run.sh - Created gauntlet test case to verify sed availability
- Verified no regressions in existing functionality
- Added
Ad Hoc: Implement Real-time Progress Feedback for Gauntlet Matrix Integration (Issue #55)
-
Ad Hoc: Implement Real-time Progress Feedback for Gauntlet Matrix Integration (Issue #55)
- Enhanced gauntlet execution with optional progress callback parameter
- Implemented dynamic progress table with task status indicators (⏳ PENDING, ▶️ NEXT, ✅ PASS, ❌ FAIL)
- Added comprehensive real-time feedback messages throughout gauntlet execution
- Updated bot integration to provide progress callback for Matrix chat display
- Maintained complete backward compatibility with CLI usage
- Added comprehensive test coverage including progress callback verification
- All 125 tests pass with new functionality integrated
Ad Hoc: Fix Build Artifacts Being Built in Source Tree
-
Ad Hoc: Fix Build Artifacts Being Built in Source Tree
- Removed 66 build artifacts (_.js, _.d.ts, *.d.ts.map) from source tree
- Configured tsconfig.json to use outDir: ‘./build’ for all compilation output
- Updated .gitignore with comprehensive patterns to prevent future artifact commits
- Verified TypeScript compilation and tests work with new build directory configuration
Ad Hoc: Fix GitHub Copilot Assignment Verification Logic
- Ad Hoc: Fix GitHub Copilot Assignment Verification Logic
- Investigated false error in GitHub Copilot assignment verification causing unnecessary demo mode fallback
- Identified that verification logic was incorrectly throwing errors even when assignments were successful
- Modified verification to log warnings instead of throwing errors for timing/response structure variations
- Maintained proper error handling for actual assignment failures
- Validated fix with comprehensive test suite ensuring all functionality remains intact
Fix GitHub Copilot Task: refine-existing-codebase scoring validation order
- Fix refine-existing-codebase gauntlet task validation order
- Analyzed issue #97 where the task was failing due to incorrect execution order
- Identified root cause: validation code was creating initial server.js file AFTER bot execution, overwriting bot’s modifications
- Moved file creation from validation phase (successCondition) to setup phase (before bot execution)
- Added pre-task setup logic specifically for refine-existing-codebase task
- Preserved all existing validation logic (endpoint testing, JSON response validation)
- Verified fix with comprehensive testing - all tests pass
- Ensured minimal, surgical changes with no impact on other gauntlet tasks
Ad Hoc: Fix Deep Linking in Copilot Session Started Message (Issue #42)
- Ad Hoc: Fix Deep Linking in Copilot Session Started Message (Issue #42)
- Identified issue where ‘Copilot session started’ message used generic
https://github.com/copilot/agentsURL instead of deep linking to session details - Modified
formatStatusUpdatemethod to use issue-specific URLs when available but no PR exists yet - Updated test expectations to verify deep linking to GitHub issue URL
- Maintained backward compatibility with existing URL fallback logic
- Verified fix with comprehensive test suite ensuring all functionality remains intact
- Identified issue where ‘Copilot session started’ message used generic
Fix refine-existing-codebase gauntlet task setup infrastructure
- Fix refine-existing-codebase gauntlet task setup
- Analyzed issue #99 where setupContainer failed due to missing /project directory and flake.nix
- Identified that
nix developcommands require a flake.nix file in the working directory - Modified setupContainer to create /project directory using
mkdir -p /project - Added comprehensive flake.nix creation with all required tools (bun, jq, sed, python+requests, curl, which, hugo)
- Preserved existing server.js creation logic exactly as before
- Verified fix with comprehensive testing - all tests continue to pass
- Ensured minimal, surgical changes with no impact on other gauntlet tasks
- Made refine-existing-codebase task self-sufficient and no longer dependent on create-project-dir task
Ad Hoc: Fix Markdown Link Rendering in Copilot Streaming Messages (Issue #40)
- Ad Hoc: Fix Markdown Link Rendering in Copilot Streaming Messages (Issue #40)
- Identified root cause: Copilot streaming chunks with markdown links were sent as plain text instead of formatted HTML
- Added
hasMarkdownLinks()helper function to detect markdown links in text chunks using regex pattern - Modified Copilot streaming callback to route chunks with markdown to HTML formatting using existing
sendMarkdownMessage()helper - Created comprehensive test suite to verify markdown detection, HTML formatting, and end-to-end streaming behavior
- Ensured fix is surgical and targeted - only affects Copilot status messages with GitHub links, preserves all existing functionality
- All 106 tests passing, confirming no regressions introduced
- Follow-up: Refactored function naming based on user feedback
- Enhanced existing
sendMarkdownMessage()function to automatically detect markdown content instead of creating newsendMessageSmart()function - Avoided function naming changes to reduce cognitive overhead and merge conflict potential
- Generalized markdown detection to include links, code blocks, bold, italic, and headings
- Replaced all message sending calls to use enhanced smart detection while preserving existing function names
- All 110 tests continue to pass with comprehensive markdown support
- Enhanced existing
Ad Hoc: Fix Gauntlet Command Markdown Formatting in Matrix (Issue #38)
- Ad Hoc: Fix Gauntlet Command Markdown Formatting in Matrix (Issue #38)
- Identified root cause: gauntlet help/list commands using
sendMessage()instead ofsendMarkdownMessage() - Fixed gauntlet help command to use
sendMarkdownMessage()for proper HTML formatting - Fixed gauntlet list command to use
sendMarkdownMessage()for proper HTML formatting - Added comprehensive test coverage for gauntlet command markdown formatting
- Enhanced test mocks to handle gauntlet-specific content patterns
- Verified all 105 tests pass with no regressions
- Identified root cause: gauntlet help/list commands using
Refine !tasks Command for New Directory Structure
- Analyze current !tasks command implementation in bot.ts
- Create utility function to parse front matter from task files
- Create function to scan docs/_tasks/ directory for task files
- Create function to filter tasks by completion status
- Create function to assemble markdown from uncompleted tasks
- Update !tasks command handler to use new logic
- Test the refined !tasks command functionality
- Ensure markdown is properly converted to HTML and sent to chat
Restructure TASKS.md and DEVLOG.md to Eliminate Merge Conflicts
- Analyze current merge conflict issues with centralized TASKS.md and DEVLOG.md files
- Design directory-based structure for individual task and devlog entries
- Configure Jekyll collections for
_tasksand_devlogsdirectories - Create aggregate pages that display entries in proper chronological order
- Create sample entries to demonstrate the new structure
- Migrate remaining content from existing TASKS.md and DEVLOG.md files
- Update documentation and contributing guidelines
- Test the new system with multiple contributors
Implement Agent Self-Correction and Learning Mechanisms
- Investigate mechanisms for the agent to learn from its mistakes
- Design a feedback system that captures failed task summaries
- Implement context injection of previous failures for better future performance
- Develop a self-correction loop that allows agents to retry tasks with improved approaches
- Create metrics to measure learning effectiveness over time
- Test self-correction mechanisms with the gauntlet testing framework
Enhance Matrix Interface User Experience and Commands
- Implement more structured output formatting for better readability
- Improve error reporting with actionable suggestions
- Design more intuitive command syntax and help system
- Add command auto-completion or suggestion features
- Implement progress indicators for long-running operations
- Add GitHub Copilot progress tracking via iframe integration - Embed GitHub’s native progress interface directly in Matrix client to show real-time Copilot agent progress including thoughts, file analysis, and command outputs instead of basic polling messages
- Create user-friendly onboarding flow for new Matrix room users
- Add support for rich message formatting (tables, code highlighting, etc.)
Design and Implement Multi-Agent Collaboration Framework
- Design architecture for multiple specialized AI agents working together
- Define agent specialization areas (e.g., code review, testing, documentation, deployment)
- Implement task delegation and coordination mechanisms
- Create communication protocols between agents
- Develop conflict resolution strategies for concurrent operations
- Design workload balancing and agent resource management
- Test multi-agent workflows on complex development tasks
- Create monitoring and observability for multi-agent operations
Systematic Gauntlet Testing and Model Performance Benchmarking
- Run comprehensive gauntlet tests against all available local models
- Test gauntlet against proprietary models (GPT-4, Gemini, etc.) for comparison
- Establish performance benchmarks and scoring metrics
- Analyze failure patterns across different model types and sizes
- Document common failure points and edge cases
- Create automated benchmark reporting and tracking system
- Use benchmark results to guide prompt engineering improvements
Iterative Prompt Engineering Based on Gauntlet Results
- Analyze gauntlet failure patterns to identify prompt improvement opportunities
- Refine system prompts in
prompts.tsbased on empirical evidence - Implement A/B testing framework for prompt variations
- Test prompt improvements against benchmark tasks
- Document prompt engineering best practices and lessons learned
- Create automated prompt optimization pipeline
- Improve tool-use capabilities through targeted prompt engineering
Enhance Pre-commit Hook to Enforce Devlog and Task Entry Requirements
Objective
Fix the pre-commit hook to enforce that every commit includes both a devlog entry and a task entry, addressing the issue that PR 92 bypassed workflow requirements.
Requirements
- Clean up test artifacts: Remove test content from DEVLOG.md and test_file.txt from previous commits
- Enhance pre-commit hook: Add logic to require both devlog and task entries for every commit
- Smart detection: Allow documentation-only commits to proceed without devlog/task requirements
- Clear messaging: Provide actionable error messages when requirements are missing
- Maintain existing protections: Keep the current prevention of direct DEVLOG.md/TASKS.md editing
Implementation Details
File Cleanup
- ✅ Reverted DEVLOG.md to remove erroneous “test” line at line 57
- ✅ Removed test_file.txt that was accidentally committed
Pre-commit Hook Enhancement
- ✅ Added detection for devlog files in
docs/_devlogs/ - ✅ Added detection for task files in
docs/_tasks/ - ✅ Implemented smart logic to exempt documentation-only commits
- ✅ Enhanced error messaging with specific requirements and guidance
- ✅ Maintained existing legacy file protection
Logic Flow
- Check for unstaged changes and untracked files (existing)
- Prevent direct editing of DEVLOG.md and TASKS.md (existing)
- NEW: For non-documentation commits, require:
- At least one file in
docs/_devlogs/ - At least one file in
docs/_tasks/
- At least one file in
- Provide clear error messages for missing requirements
Testing Strategy
- Test that hook blocks commits missing devlog entries
- Test that hook blocks commits missing task entries
- Test that hook allows documentation-only commits
- Test that hook still prevents legacy file editing
- Verify error messages are clear and actionable
Success Criteria
- ✅ Pre-commit hook enforces devlog entry requirement
- ✅ Pre-commit hook enforces task entry requirement
- ✅ Documentation-only commits are allowed to proceed
- ✅ Clear error messages guide users on missing requirements
- ✅ Existing legacy file protections remain intact
Status: Completed ✅
The pre-commit hook has been successfully enhanced to enforce both devlog and task entry requirements for every commit, while maintaining flexibility for documentation-only changes.
UPDATES:
- ✅ Fixed documentation detection logic to correctly identify README.md as a core project file requiring devlog/task entries
- ✅ CRITICAL FIX: Resolved Husky configuration issue where hooks weren’t being called due to missing initialization and broken hook delegation
- ✅ FULLY VERIFIED: All scenarios tested and working correctly:
- Blocks commits without devlog entries
- Blocks commits without task entries
- Allows documentation-only commits (docs/ directory)
- Prevents direct DEVLOG.md/TASKS.md editing
- Provides clear error messages
Fix gauntlet task order: swap create-project-dir and add-jq positions
- Fix gauntlet task execution order
- Analyzed issue #105 requiring task order swap: create-project-dir should be 1st, add-jq should be 3rd
- Identified current problematic order: add-jq (1st), check-sed-available (2nd), create-project-dir (3rd)
- Implemented solution by reordering task objects in gauntlet tasks array
- New logical order: create-project-dir (1st), check-sed-available (2nd), add-jq (3rd)
- Added comprehensive tests to verify and maintain correct task ordering
- Verified all existing functionality preserved (220 tests pass)
- Confirmed logical dependency resolution: /project directory created before tasks that use it
Fix Gauntlet Provider Validation Logic
- Fix Gauntlet Provider Validation Logic
- Identified issue where gauntlet command was checking current provider instead of requested provider
- Analyzed that the early check in
handleGauntletCommandwas blocking valid gauntlet executions - Removed incorrect check for
this.currentLLMProvider === 'copilot'since gauntlet creates its own bot instance - Verified that existing argument parsing already prevents copilot from being specified as
--provider - Updated tests to reflect corrected behavior - gauntlet can run regardless of current provider
- Added test coverage for edge cases: openai provider when current is copilot, and blocking explicit copilot requests
- Validated fix ensures gauntlet works with any valid provider (openai/ollama) regardless of bot’s current state
Contributing Tasks
To add a new task:
- Create a new file in
docs/_tasks/with the naming conventiontask-{number}-{short-description}.md - Include front matter with
title,order, andstatusfields - Write the task description in markdown
- This page will automatically include your new task
For more information, see our contributing guide.
Morpheum