GradeM8 — AI Grading Assistant

AI-powered document grading with rubric-based feedback using Llama 2.

PythonGradioHuggingFace Inference APILlama 2 70BPyMuPDFpython-docxhttpxpytestHuggingFace Spaces

Timeline: 3 weeks

Role: Solo Developer

GradeM8 AI grading interface showing document upload and rubric input

Overview

GradeM8 is an AI-powered grading assistant that automates document assessment using HuggingFace's Llama 2 70B model. Teachers and graders can upload student submissions (PDF, DOCX, images) along with a rubric, and receive detailed feedback with scores, strengths, areas for improvement, and rubric breakdowns.

Built with Gradio for an intuitive web interface and deployed on HuggingFace Spaces, the app features OCR support for scanned documents, batch processing for multiple submissions, and comprehensive error handling. The modular architecture separates AI routing, document extraction, and UI concerns for maintainability.

Case Study

The Problem

Manual grading is time-consuming and inconsistent. Teachers need a tool that can provide detailed, rubric-based feedback quickly while maintaining quality.

Audience & Stakes

Teachers, teaching assistants, and graders who evaluate written submissions. Inconsistent or delayed feedback affects student learning outcomes and instructor workload.

My Approach

Built a Python application using Gradio for the UI and HuggingFace Inference API for Llama 2 access. Implemented modular document extraction (PDF, DOCX, images), async batch processing, and structured JSON parsing for reliable feedback generation.

Tradeoffs

Chose HuggingFace Spaces over custom deployment for zero infrastructure management. Accepted API latency trade-off for simplified hosting and automatic scaling.

Impact

Reduces grading time by 60-70% for routine assignments. Provides consistent, detailed feedback that teachers can review and customize. 285 unit tests ensure reliability across document types.

Key Features

AI grading powered by Llama 2 70B via HuggingFace Inference API
Multi-format document support: PDF, DOCX, DOC, and image files
OCR fallback for scanned documents using DeepSeek-OCR
Batch processing with concurrent grading and progress tracking
Detailed rubric-based feedback with score breakdowns
Responsive Gradio UI with accessibility features
Comprehensive test suite with 285+ unit tests

Technical Challenges

Parsing diverse document formats reliably (scanned PDFs, legacy .doc files)
Handling HuggingFace API rate limits and model loading delays gracefully
Extracting structured JSON from LLM responses with robust fallback parsing
Designing a UI that accommodates both single and batch grading workflows

Key Learnings

HuggingFace Inference API provides good Llama 2 access without GPU infrastructure
Gradio's component model works well for rapid AI app prototyping
Async patterns in Python significantly improve batch processing throughput
Comprehensive test coverage catches edge cases in document parsing early