chunking-toolkit

Name: chunking-toolkit
Availability: InStock
Author: aiskillstore-team

v1.0.0 approved Text Processing ⬇ 39 ↑ 14/7d 1mo ago

USK v3 ✅ Verified ⚡ Auto-Convert

⬇ Download

Install Guide↓

🤖 Agent install commands (curl / MCP / Claude Desktop)

▸ curl one-liner

curl -L -o chunking-toolkit.skill   "https://aiskillstore.io/v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=ClaudeCode"

▸ MCP tool call (after registering Skill Store MCP)

{
  "tool": "download_skill",
  "arguments": {
    "skill_id": "d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27",
    "platform": "ClaudeCode"
  }
}

▸ Claude Desktop / Cursor MCP config (one-time)

{
  "mcpServers": {
    "skill-store": {
      "url": "https://aiskillstore.io/mcp/"
    }
  }
}

📖 Full agent API guide: /llms.txt · MCP server card

Text chunking toolkit with 5 actions: chunk, detect_boundaries, count_tokens, merge_chunks, audit_chunk_quality. Zero external dependencies. 6 strategies including Korean sentence boundary detection, token-aware chunking, overlap management.

# chunking # rag # text # nlp # korean # sentence # tokens # embedding # llm

Basic Info

Owner 👤 aiskillstore-team Category Text Processing Registered 2026-05-07 Last Updated 2026-05-07 Latest Version 1.0.0 Packaged At 2026-05-07 Vetting Status approved Downloads 39 Checksum (SHA256) fae09db2527bd1da2370657ebc17f11c4e8c46d11a0f928c02ad057829a0037e

⚡ AGENT INFO USK v3

Capabilities

text_chunking sentence_boundary_detection korean_chunking token_aware_chunking overlap_management

Permissions

✗ network
✗ filesystem
✗ subprocess

Interface

type: cli entry_point: main.py runtime: python3 call_pattern: stdin_stdout

Agent API

# 스킬 스키마 조회 (에이전트가 호출 방법을 파악) GET /v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/schema # 플랫폼별 자동 변환 다운로드 GET /v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=OpenClaw GET /v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=ClaudeCode GET /v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=ClaudeCodeAgentSkill GET /v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=Cursor GET /v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=GeminiCLI GET /v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=CodexCLI GET /v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=CustomAgent

Installation

Compatible Platforms any

1

Install the skill using openclaw_skill_manager.py.

python openclaw_skill_manager.py --install chunking-toolkit

2

Verify installation

python openclaw_skill_manager.py --list-installed

3

Install a specific version (optional)

python openclaw_skill_manager.py --install chunking-toolkit --version 1.0.0

1

Download the skill package.

curl -O https://aiskillstore.io/v1/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download

2

Place it in the Claude Code commands directory.

unzip chunking-toolkit.skill -d ~/.claude/commands/chunking-toolkit/

3

Use it as a slash command in Claude Code.

/chunking-toolkit

1

Download the Agent Skills package.

curl -O https://aiskillstore.io/v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=ClaudeCodeAgentSkill

2

Unzip it into the Claude Code skills directory.

unzip chunking-toolkit-agent-skill-*.skill -d ~/.claude/skills/chunking-toolkit/

3

Restart Claude Code — the skill is auto-loaded at session start. No slash command needed.

1

Download the Cursor-converted package.

curl -O https://aiskillstore.io/v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=Cursor

2

Unzip and place it in a permanent location.

unzip chunking-toolkit-cursor-*.skill -d ~/.cursor/skills/chunking-toolkit/

3

Add the MCP server config to .cursor/mcp.json, then restart Cursor.

cat ~/.cursor/skills/chunking-toolkit/cursor_mcp_config.json

1

Download the Gemini CLI-converted package.

curl -O https://aiskillstore.io/v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=GeminiCLI

2

Unzip and place it in a permanent location.

unzip chunking-toolkit-geminicli-*.skill -d ~/.gemini/skills/chunking-toolkit/

3

Add the MCP server config to ~/.gemini/settings.json, then restart Gemini CLI.

cat ~/.gemini/skills/chunking-toolkit/gemini_settings_snippet.json

1

Download the Codex CLI-converted package.

curl -O https://aiskillstore.io/v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=CodexCLI

2

Unzip and place it in a permanent location.

unzip chunking-toolkit-codexcli-*.skill -d ~/.codex/skills/chunking-toolkit/

3

Add the MCP server config to ~/.codex/config.toml, then restart Codex CLI.

cat ~/.codex/skills/chunking-toolkit/codex_config_snippet.toml

1

Download the skill package via REST API.

GET https://aiskillstore.io/v1/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download

2

Place it in your agent platform's skills directory.

cp chunking-toolkit.skill ./skills/

3

Fetch platform-specific details via the Install Guide API.

GET https://aiskillstore.io/v1/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/install-guide?platform=CustomAgent

Security Vetting Report

Vetting Result APPROVED

Findings: ["메타데이터 경고: 권장 필드 없음: 'requirements' (SKILL.md v2 권장)", "메타데이터 경고: 권장 필드 없음: 'changelog' (SKILL.md v2 권장)"]

✅ No security risks found.

AI Review Stage

Reviewer gemini Risk Level 🟢 Low Review Summary 텍스트 청크 분할 및 관련 유틸리티 스킬로, 외부 의존성 없이 표준 라이브러리만 사용하며, 선언된 권한과 코드 내용이 일치하여 안전합니다.

Reasoning

1. **권한 일치 여부**: 스킬 메타데이터에 선언된 `permissions` (network: false, filesystem: false, subprocess: false)는 매우 제한적이며, 제공된 `main.py` 코드 스니펫 및 스킬 설명('Zero external dependencies -- standard library only')과 완벽하게 일치합니다. 외부 통신, 파일 시스템 접근, 서브프로세스 실행 등의 위험 요소가 없습니다. 2. **악의적 코드 여부**: 제공된 코드 스니펫은 언어 감지 및 토큰 추정 로직을 포함하며, 이는 스킬의 핵심 기능과 직접적으로 관련되어 있습니다. 데이터 탈취, 시스템 파괴, 난독화 등 악의적인 목적의 코드는 발견되지 않았습니다. 정적 분석 결과에서도 'red_flags_found', 'obfuscation_warnings', 'forbidden_exec_files_found' 모두 비어 있어 안전함을 뒷받침합니다. 3. **선언되지 않은 외부 통신**: `network: false`로 명시되어 있으며, 코드 스니펫에서도 외부 통신을 시도하는 어떠한 모듈(예: `requests`, `urllib`, `socket`)도 import하거나 사용하지 않습니다. 따라서 선언되지 않은 외부 통신은 없습니다. 4. **사용자 데이터 무단 수집/전송**: 스킬의 목적은 텍스트 처리이며, 입력된 텍스트를 처리하여 결과를 `stdout`으로 반환하는 방식입니다. 네트워크 권한이 없으므로 사용자 데이터를 외부로 무단 수집하거나 전송할 수 없습니다. 5. **코드 품질 및 목적 일치**: 코드 스니펫은 스킬의 설명된 기능(언어 감지, 토큰 계산)을 충실히 구현하고 있으며, 명확하고 표준적인 Python 코드로 작성되었습니다. 스킬 메타데이터의 `input_schema`, `output_schema`, `capabilities`, `examples` 또한 스킬의 목적과 기능을 명확하게 설명하고 있어 코드 품질과 목적이 잘 일치합니다. 종합적으로 판단할 때, 이 스킬은 보안 위험이 매우 낮으며, 선언된 기능과 권한 범위 내에서 안전하게 동작할 것으로 예상됩니다.

Version History

Version	USK v3	Vetting Status	Packaged At	Downloads	Changelog
v1.0.0	✓	approved	2026-05-07	⬇ 39	—

Examples 7

Representative input/output examples for this skill. Agents can use these to understand how to invoke the skill and what output to expect.

chunk_sentence_korean

한국어 문장 경계 기반 청크 분할 / Korean sentence boundary chunking

📥 Input

{
  "action": "chunk",
  "chunk_size": 100,
  "lang": "ko",
  "overlap": 10,
  "strategy": "sentence_boundary",
  "text": "\uc778\uacf5\uc9c0\ub2a5 \uae30\uc220\uc774 \ube60\ub974\uac8c \ubc1c\uc804\ud558\uace0 \uc788\uc2b5\ub2c8\ub2e4. \ud2b9\ud788 \ub300\ud615 \uc5b8\uc5b4 \ubaa8\ub378\uc740 \ub9ce\uc740 \ubd84\uc57c\uc5d0\uc11c \ud601\uc2e0\uc744 \uc774\ub04c\uace0 \uc788\uc2b5\ub2c8\ub2e4. \uadf8\ub7ec\ub098 \ube44\uc6a9\uacfc \uc9c0\uc5f0 \uc2dc\uac04\uc774 \uacfc\uc81c\ub85c \ub0a8\uc544 \uc788\uc2b5\ub2c8\ub2e4."
}

📤 Output

{
  "action": "chunk",
  "chunks": [
    {
      "char_end": 55,
      "char_start": 0,
      "index": 0,
      "overlap_with_prev": 0,
      "text": "\uc778\uacf5\uc9c0\ub2a5 \uae30\uc220\uc774 \ube60\ub974\uac8c \ubc1c\uc804\ud558\uace0 \uc788\uc2b5\ub2c8\ub2e4. \ud2b9\ud788 \ub300\ud615 \uc5b8\uc5b4 \ubaa8\ub378\uc740 \ub9ce\uc740 \ubd84\uc57c\uc5d0\uc11c \ud601\uc2e0\uc744 \uc774\ub04c\uace0 \uc788\uc2b5\ub2c8\ub2e4.",
      "token_count": 45
    },
    {
      "char_end": 79,
      "char_start": 56,
      "index": 1,
      "overlap_with_prev": 10,
      "text": "\uadf8\ub7ec\ub098 \ube44\uc6a9\uacfc \uc9c0\uc5f0 \uc2dc\uac04\uc774 \uacfc\uc81c\ub85c \ub0a8\uc544 \uc788\uc2b5\ub2c8\ub2e4.",
      "token_count": 23
    }
  ],
  "meta": {
    "lang_detected": "ko",
    "strategy": "sentence_boundary",
    "total_chunks": 2
  }
}

chunk_fixed_size_english

고정 크기 청크 분할 (영어) / Fixed-size chunking for English text

📥 Input

{
  "action": "chunk",
  "chunk_size": 50,
  "lang": "en",
  "overlap": 10,
  "strategy": "fixed_size",
  "text": "Artificial intelligence is transforming industries. Machine learning models can analyze data at scale. Natural language processing enables human-computer interaction."
}

📤 Output

{
  "action": "chunk",
  "chunks": [
    {
      "char_end": 100,
      "char_start": 0,
      "index": 0,
      "overlap_with_prev": 0,
      "text": "Artificial intelligence is transforming industries. Machine learning models can analyze data at scale.",
      "token_count": 19
    },
    {
      "char_end": 162,
      "char_start": 101,
      "index": 1,
      "overlap_with_prev": 10,
      "text": "Natural language processing enables human-computer interaction.",
      "token_count": 9
    }
  ],
  "meta": {
    "lang_detected": "en",
    "strategy": "fixed_size",
    "total_chunks": 2
  }
}

detect_boundaries_mixed

문장 경계 감지 — 한영 혼합 텍스트 / Detect sentence boundaries in mixed Korean-English text

📥 Input

{
  "action": "detect_boundaries",
  "lang": "auto",
  "text": "Hello world. \uc548\ub155\ud558\uc138\uc694! This is a test. \uc88b\uc740 \ud558\ub8e8 \ub418\uc138\uc694."
}

📤 Output

{
  "action": "detect_boundaries",
  "boundaries": [
    {
      "confidence": "high",
      "position": 12,
      "type": "sentence_end"
    },
    {
      "confidence": "high",
      "position": 22,
      "type": "sentence_end"
    },
    {
      "confidence": "high",
      "position": 37,
      "type": "sentence_end"
    }
  ],
  "meta": {
    "lang_detected": "mixed",
    "total_boundaries": 3
  }
}

count_tokens_context_check

텍스트 토큰 수 계산 및 컨텍스트 적합성 확인 / Count tokens and check if text fits LLM context

📥 Input

{
  "action": "count_tokens",
  "lang": "en",
  "model_context": 4096,
  "text": "This is a sample document for token counting. It contains multiple sentences."
}

📤 Output

{
  "action": "count_tokens",
  "meta": {
    "size_unit": "tokens"
  },
  "token_info": {
    "context_window": 4096,
    "estimated_chunks": 1,
    "fits_in_context": true,
    "lang_detected": "en",
    "lang_multiplier": 1.0,
    "total_tokens": 16
  }
}

merge_chunks_basic

작은 청크들을 병합하여 최적 크기로 조정 / Merge small chunks into optimal size

📥 Input

{
  "action": "merge_chunks",
  "chunks": [
    {
      "index": 0,
      "text": "Hello.",
      "token_count": 2
    },
    {
      "index": 1,
      "text": "World!",
      "token_count": 2
    },
    {
      "index": 2,
      "text": "How are you today?",
      "token_count": 5
    }
  ],
  "merge_max_size": 20
}

📤 Output

{
  "action": "merge_chunks",
  "chunks": [
    {
      "char_end": 31,
      "char_start": 0,
      "index": 0,
      "overlap_with_prev": 0,
      "text": "Hello. World! How are you today?",
      "token_count": 9
    }
  ],
  "meta": {
    "merged_count": 1,
    "original_count": 3
  }
}

audit_chunk_quality

청크 품질 감사 — 너무 크거나 작은 청크 감지 / Audit chunk quality for oversized and undersized chunks

📥 Input

{
  "action": "audit_chunk_quality",
  "chunks": [
    {
      "index": 0,
      "text": "Hi.",
      "token_count": 2
    },
    {
      "index": 1,
      "text": "This is a very long chunk that contains a lot of information and exceeds the recommended size for embedding models which typically work best with chunks of 256-512 tokens.",
      "token_count": 800
    },
    {
      "index": 2,
      "text": "Normal chunk with moderate content suitable for embedding.",
      "token_count": 120
    }
  ],
  "model_context": 512
}

📤 Output

{
  "action": "audit_chunk_quality",
  "meta": {
    "model_context": 512
  },
  "quality_report": {
    "issues": [
      {
        "chunk_index": 0,
        "fix_hint": {
          "action": "merge_chunks",
          "field": "chunks[0]",
          "reference": "https://aiskillstore.io/skills/chunking-toolkit",
          "suggested_replacement": "Use merge_chunks action to combine with adjacent chunk"
        },
        "issue": "undersized",
        "message": "Chunk 0 is very small (2 tokens). Consider merging with adjacent chunk.",
        "token_count": 2
      },
      {
        "chunk_index": 1,
        "fix_hint": {
          "action": "re_chunk",
          "field": "chunks[1]",
          "reference": "https://aiskillstore.io/skills/chunking-toolkit",
          "suggested_replacement": "Re-chunk with chunk_size \u003c= 512"
        },
        "issue": "oversized",
        "message": "Chunk 1 exceeds model context limit (800 \u003e 512 tokens).",
        "token_count": 800
      }
    ],
    "score": 55,
    "stats": {
      "avg_tokens": 307.3,
      "max_tokens": 800,
      "min_tokens": 2,
      "oversized_count": 1,
      "undersized_count": 1
    },
    "total_chunks": 3
  }
}

chunk_sliding_window

슬라이딩 윈도우 청크 분할 / Sliding window chunking for dense retrieval

📥 Input

{
  "action": "chunk",
  "chunk_size": 30,
  "lang": "en",
  "overlap": 15,
  "strategy": "sliding_window",
  "text": "The quick brown fox jumps over the lazy dog. Pack my box with five dozen liquor jugs."
}

📤 Output

{
  "action": "chunk",
  "chunks": [
    {
      "char_end": 43,
      "char_start": 0,
      "index": 0,
      "overlap_with_prev": 0,
      "text": "The quick brown fox jumps over the lazy dog.",
      "token_count": 9
    },
    {
      "char_end": 86,
      "char_start": 24,
      "index": 1,
      "overlap_with_prev": 15,
      "text": "jumps over the lazy dog. Pack my box with five dozen liquor jugs.",
      "token_count": 13
    }
  ],
  "meta": {
    "strategy": "sliding_window",
    "total_chunks": 2
  }
}

All examples are also available via the agent API: /v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/schema

Reviews & Ratings

No reviews yet. Be the first to leave one!

✍️ Write a Review

Nickname * Rating * Comment (optional)