My Career Agent

March 02, 2026 06:18pm

SCALING AI RESUME GENERATION WITH A MULTI-AGENT HYBRID PIPELINE

Introduction

In today's fast-paced job market, the "one-size-fits-all" resume is effectively dead. Job seekers now face the daunting task of tailoring every single application to specific job descriptions (JDs) and Applicant Tracking Systems (ATS). My Career Agent was born out of a desire to automate this process without sacrificing the quality and human touch required to land high-impact roles. This article explores the journey from a local-only AI experiment to a robust, cloud-hybrid multi-agent system.

The Project Summary

My Career Agent is an automated application bundle generator. Given a job posting URL, it executes a multi-stage pipeline: it scrapes the job details, researches the company via meta-search engines, anonymizes sensitive user data for privacy, and generates a precisely tailored resume and cover letter. The final output is synced to a Planka Kanban board for project management and a notification is sent via Signal.

The Problem: The "Local AI Loop" and Resource Exhaustion

The initial version of the project relied heavily on local Large Language Models (LLMs) running via Ollama. While great for privacy, Iencountered several "showstopper" challenges:

  • The Looping Issue: Smaller 1B and 3B models often entered infinite repetition loops when trying to extract data from noisy, messy job postings.
  • VRAM Hangs: Running heavy extraction tasks locally consumed 100% of GPU resources, causing the entire containerized suite to hang during long-context processing.
  • Rambling: Without strict guardrails, local models tended to "hallucinate" boilerplate text or include website menus as part of the job description.

The Solution: A Hybrid Multi-Agent Orchestration

To solve these issues, Ipivoted from a local-only model to a Hybrid Cloud-First approach. Ioffloaded the "heavy lifting" (extraction and reasoning) to Google's Gemini and Gemma 3 models while maintaining local control over the scraping and search infrastructure.

The Pipeline Stages:

  1. Raw Extraction: Raw HTML is scraped and stripped of noise (scripts, styles, navs) using BeautifulSoup.
  2. JD Distillation (The Gemma Agent): Instead of verbatim copying, Iuse Gemma 3 27B to distill core responsibilities. This model's larger context window and better reasoning prevent the loops seen in 1B models.
  3. Company Intelligence (The Research Agent): SearXNG aggregates web results about the hiring company, which are then synthesized into a "Company Dossier."
  4. Document Generation (The Gemini Agent): The persona and JD are passed to Gemini 3 Flash to generate the final Markdown-formatted documents.

The Tech Stack

  • Backend: Python 3.11, Flask
  • AI Models: Gemini 3 Flash (Generation), Gemma 3 27B (Research/Extraction)
  • Web Scraping: BeautifulSoup4, Requests
  • Search API: SearXNG (Self-hosted meta-search)
  • Persistence: Docker, JSON-based User Configs
  • Integrations: Planka API (Kanban), Signal-CLI (Notifications)

Architecture Overview

Code Highlight: The Token Tracking Client

A key feature of the new architecture is Token and Cost Tracking. Since the system uses various cloud models, Iimplemented a robust usage calculator with an estimation fallback for models that don't report metrics natively.

def _calculate_usage(self, response, is_gemma=False, default_model="unknown", prompt_text=None):
    usage = getattr(response, 'usage_metadata', None)
    
    if usage:
        input_tokens = getattr(usage, 'prompt_token_count', 0)
        output_tokens = getattr(usage, 'candidates_token_count', 0)
    else:
        # Fallback to character-based estimation (~4 chars/token)
        input_tokens = len(prompt_text) // 4 if prompt_text else 0
        output_tokens = len(response.text) // 4
        is_est = True

    # Calculate cost per 1M tokens based on .env config
    # ...
    return usage_stats

Challenges & Learnings

The "JSON Mode" Hurdle

One major challenge was that the Gemma 3 27B API does not currently support formal "JSON Mode" or native system instructions in some regions. Ibypassed this by implementing a Flexible JSON Parser that uses regex to find JSON structures within the AI's natural language response.

Privacy vs. Intelligence

To send data to the cloud safely, Ibuilt a PII (Personally Identifiable Information) cleaner. Names, emails, and phone numbers are swapped with placeholders (e.g., [USER_NAME]) before hitting the Gemini API and are swapped back only at the final step on the user's local machine.

Key Learnings

  • Token Estimation: Always build a fallback. If the API doesn't tell you what it cost, estimate it yourself based on characters.
  • Distillation over Extraction: Telling an AI to "Clean and Summarize" is much more stable than telling it to "Extract the exact text," which often triggers repetition loops.
  • Agent Specialization: Using a smart, larger model (Gemma 27B) for the research phase makes the final generation (Gemini) significantly more accurate.

Conclusion

My Career Agent demonstrates that the future of developer tools isn't just "Local AI" or "Cloud AI," but a coordinated orchestration of both. By using each model for its strengths - Gemma for reasoning and Gemini for high-speed, long-context generation - I've created a tool that produces job-winning documents in seconds for less than a cent per application.