# Source2MD
A professional CLI tool that converts entire source code folders into a single, well-structured Markdown file with automatic syntax highlighting and table of contents generation.
Features
| Feature | Description |
|---|---|
| Recursive Processing | Converts all files in subdirectories |
| Smart Filtering | Skips binary files and excluded directories |
| Syntax Highlighting | Auto-detects language via Pygments |
| Preserves Structure | Maintains directory hierarchy in output |
| Error Handling | Handles encoding issues gracefully |
| Table of Contents | Auto-generated with navigation links |
How It Works
python
def process_directory(source_dir: str, output_file: str):
# 1. Scan recursively
files = collect_source_files(source_dir)
# 2. Generate table of contents
toc = generate_toc(files)
# 3. Process each file
markdown_content = [f"# {source_dir}\n\n"]
markdown_content.append(toc)
for file in files:
# 4. Detect language
language = detect_language(file)
# 5. Read and highlight
code = read_file(file)
highlighted = highlight_code(code, language)
# 6. Add to output
markdown_content.append(f"## {relative_path(file)}\n")
markdown_content.append(f"```{language}\n{highlighted}\n```\n")
# 7. Write output
write_output(output_file, markdown_content)Language Detection
python
def detect_language(file_path: str) -> str:
ext = os.path.splitext(file_path)[1].lower()
language_map = {
'.py': 'python',
'.js': 'javascript',
'.ts': 'typescript',
'.jsx': 'jsx',
'.tsx': 'tsx',
'.html': 'html',
'.css': 'css',
'.json': 'json',
'.md': 'markdown',
'.sh': 'bash',
'.sql': 'sql',
'.go': 'go',
'.rs': 'rust',
'.java': 'java',
}
return language_map.get(ext, 'text')Filtering
python
EXCLUDED_DIRS = {'.git', '__pycache__', 'node_modules', 'venv', '.venv'}
EXCLUDED_EXTENSIONS = {'.pyc', '.pyo', '.so', '.exe', '.dll', '.bin'}
def should_include(file_path: str) -> bool:
parts = Path(file_path).parts
# Skip excluded directories
if any(excluded in parts for excluded in EXCLUDED_DIRS):
return False
# Skip binary files
ext = os.path.splitext(file_path)[1].lower()
if ext in EXCLUDED_EXTENSIONS:
return False
return TrueUsage
bash
python source_to_markdown.pyInteractive prompts:
code
Enter source folder path: /path/to/project
Enter output markdown file (default: project_docs.md):Output Format
markdown
# project/
## Table of Contents
- [src/main.py](#src/mainpy)
- [src/utils.py](#src/utilspy)
- [tests/test.py](#tests/testpy)
## src/main.py
import os
def main(): print("Hello, World!")
if __name__ == "__main__": main()
code
## src/utils.py
def helper(): pass
code
## PyInstaller Package
Build standalone executable:
pyinstaller --onefile --name "Source2MD" source_to_markdown.py
code
The executable will be in `dist/Source2MD.exe`.
## Use Cases
1. **Code documentation**: Generate single-file documentation for review
2. **Code sharing**: Share formatted code snippets
3. **GitHub READMEs**: Embed entire source in repository docs
4. **Code review**: Convert project to Markdown for offline reviewArchitecture Feedback
Spotted a potential optimization or antipattern? Let me know.