Initializing
Back to Projects
Year2024
DomainFullstack
AccessOpen Source
Complexity0 / 10
PythonCLIDocumentationMarkdown
FullstackArchived

Source2MD

A CLI tool that recursively converts entire source code folders into a single Markdown file with syntax highlighting, table of contents, and directory structure preservation.

# Source2MD

A professional CLI tool that converts entire source code folders into a single, well-structured Markdown file with automatic syntax highlighting and table of contents generation.

Features

Parsing system architecture diagram...
FeatureDescription
Recursive ProcessingConverts all files in subdirectories
Smart FilteringSkips binary files and excluded directories
Syntax HighlightingAuto-detects language via Pygments
Preserves StructureMaintains directory hierarchy in output
Error HandlingHandles encoding issues gracefully
Table of ContentsAuto-generated with navigation links

How It Works

python
def process_directory(source_dir: str, output_file: str):
    # 1. Scan recursively
    files = collect_source_files(source_dir)
    
    # 2. Generate table of contents
    toc = generate_toc(files)
    
    # 3. Process each file
    markdown_content = [f"# {source_dir}\n\n"]
    markdown_content.append(toc)
    
    for file in files:
        # 4. Detect language
        language = detect_language(file)
        
        # 5. Read and highlight
        code = read_file(file)
        highlighted = highlight_code(code, language)
        
        # 6. Add to output
        markdown_content.append(f"## {relative_path(file)}\n")
        markdown_content.append(f"```{language}\n{highlighted}\n```\n")
    
    # 7. Write output
    write_output(output_file, markdown_content)

Language Detection

python
def detect_language(file_path: str) -> str:
    ext = os.path.splitext(file_path)[1].lower()
    
    language_map = {
        '.py': 'python',
        '.js': 'javascript',
        '.ts': 'typescript',
        '.jsx': 'jsx',
        '.tsx': 'tsx',
        '.html': 'html',
        '.css': 'css',
        '.json': 'json',
        '.md': 'markdown',
        '.sh': 'bash',
        '.sql': 'sql',
        '.go': 'go',
        '.rs': 'rust',
        '.java': 'java',
    }
    
    return language_map.get(ext, 'text')

Filtering

python
EXCLUDED_DIRS = {'.git', '__pycache__', 'node_modules', 'venv', '.venv'}
EXCLUDED_EXTENSIONS = {'.pyc', '.pyo', '.so', '.exe', '.dll', '.bin'}

def should_include(file_path: str) -> bool:
    parts = Path(file_path).parts
    
    # Skip excluded directories
    if any(excluded in parts for excluded in EXCLUDED_DIRS):
        return False
    
    # Skip binary files
    ext = os.path.splitext(file_path)[1].lower()
    if ext in EXCLUDED_EXTENSIONS:
        return False
    
    return True

Usage

bash
python source_to_markdown.py

Interactive prompts:

code
Enter source folder path: /path/to/project
Enter output markdown file (default: project_docs.md):

Output Format

markdown
# project/

## Table of Contents
- [src/main.py](#src/mainpy)
- [src/utils.py](#src/utilspy)
- [tests/test.py](#tests/testpy)

## src/main.py

import os

def main(): print("Hello, World!")

if __name__ == "__main__": main()

code

## src/utils.py

def helper(): pass

code

## PyInstaller Package

Build standalone executable:

pyinstaller --onefile --name "Source2MD" source_to_markdown.py

code

The executable will be in `dist/Source2MD.exe`.

## Use Cases

1. **Code documentation**: Generate single-file documentation for review
2. **Code sharing**: Share formatted code snippets
3. **GitHub READMEs**: Embed entire source in repository docs
4. **Code review**: Convert project to Markdown for offline review

Architecture Feedback

Spotted a potential optimization or antipattern? Let me know.

Submit a Technical Suggestion