chunking-toolkit

Name: chunking-toolkit
Availability: InStock
Author: aiskillstore-team

v1.0.0 approved Text Processing ⬇ 2 ↑ 2/7일 오늘 업데이트

USK v3 ✅ Verified ⚡ Auto-Convert

⬇ 다운로드

설치 가이드↓

🤖 에이전트용 설치 명령 (curl / MCP / Claude Desktop)

▸ curl 한 줄 다운로드

curl -L -o chunking-toolkit.skill   "https://aiskillstore.io/v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=ClaudeCode"

▸ MCP 도구 호출 (Skill Store MCP 등록 시)

{
  "tool": "download_skill",
  "arguments": {
    "skill_id": "d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27",
    "platform": "ClaudeCode"
  }
}

▸ Claude Desktop / Cursor MCP 설정 (1회)

{
  "mcpServers": {
    "skill-store": {
      "url": "https://aiskillstore.io/mcp/"
    }
  }
}

📖 에이전트용 전체 API 가이드: /llms.txt · MCP server card

Text chunking toolkit with 5 actions: chunk, detect_boundaries, count_tokens, merge_chunks, audit_chunk_quality. Zero external dependencies. 6 strategies including Korean sentence boundary detection, token-aware chunking, overlap management.

# chunking # rag # text # nlp # korean # sentence # tokens # embedding # llm

기본 정보

소유자 👤 aiskillstore-team 카테고리 Text Processing 등록일 2026-05-07 최종 업데이트 2026-05-07 최신 버전 1.0.0 패키지 날짜 2026-05-07 검증 상태 approved 다운로드 수 2회 체크섬 (SHA256) fae09db2527bd1da2370657ebc17f11c4e8c46d11a0f928c02ad057829a0037e

⚡ AGENT INFO USK v3

Capabilities

text_chunking sentence_boundary_detection korean_chunking token_aware_chunking overlap_management

Permissions

✗ network
✗ filesystem
✗ subprocess

Interface

type: cli entry_point: main.py runtime: python3 call_pattern: stdin_stdout

Agent API

# 스킬 스키마 조회 (에이전트가 호출 방법을 파악) GET /v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/schema # 플랫폼별 자동 변환 다운로드 GET /v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=OpenClaw GET /v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=ClaudeCode GET /v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=ClaudeCodeAgentSkill GET /v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=Cursor GET /v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=GeminiCLI GET /v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=CodexCLI GET /v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=CustomAgent

설치 방법

호환 플랫폼: any

1

openclaw_skill_manager.py로 스킬을 설치합니다.

python openclaw_skill_manager.py --install chunking-toolkit

2

설치 확인

python openclaw_skill_manager.py --list-installed

3

특정 버전 설치 (선택)

python openclaw_skill_manager.py --install chunking-toolkit --version 1.0.0

1

스킬 패키지를 다운로드합니다.

curl -O https://aiskillstore.io/v1/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download

2

Claude Code commands 디렉터리에 배치합니다.

unzip chunking-toolkit.skill -d ~/.claude/commands/chunking-toolkit/

3

Claude Code에서 슬래시 커맨드로 사용합니다.

/chunking-toolkit

1

Agent Skills 패키지를 다운로드합니다.

curl -O https://aiskillstore.io/v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=ClaudeCodeAgentSkill

2

Claude Code skills 디렉터리에 압축을 해제합니다.

unzip chunking-toolkit-agent-skill-*.skill -d ~/.claude/skills/chunking-toolkit/

3

Claude Code를 재시작하면 세션 시작 시 자동으로 로드됩니다. 슬래시 커맨드 없이 자연어로 사용 가능합니다.

1

Cursor 변환 패키지를 다운로드합니다.

curl -O https://aiskillstore.io/v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=Cursor

2

압축 해제 후 영구 위치에 저장합니다.

unzip chunking-toolkit-cursor-*.skill -d ~/.cursor/skills/chunking-toolkit/

3

.cursor/mcp.json에 MCP 서버 설정을 추가하고 Cursor를 재시작합니다.

cat ~/.cursor/skills/chunking-toolkit/cursor_mcp_config.json

1

Gemini CLI 변환 패키지를 다운로드합니다.

curl -O https://aiskillstore.io/v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=GeminiCLI

2

압축 해제 후 영구 위치에 저장합니다.

unzip chunking-toolkit-geminicli-*.skill -d ~/.gemini/skills/chunking-toolkit/

3

~/.gemini/settings.json에 MCP 서버 설정을 추가하고 Gemini CLI를 재시작합니다.

cat ~/.gemini/skills/chunking-toolkit/gemini_settings_snippet.json

1

Codex CLI 변환 패키지를 다운로드합니다.

curl -O https://aiskillstore.io/v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download?platform=CodexCLI

2

압축 해제 후 영구 위치에 저장합니다.

unzip chunking-toolkit-codexcli-*.skill -d ~/.codex/skills/chunking-toolkit/

3

~/.codex/config.toml에 MCP 서버 설정을 추가하고 Codex CLI를 재시작합니다.

cat ~/.codex/skills/chunking-toolkit/codex_config_snippet.toml

1

REST API로 스킬 패키지를 다운로드합니다.

GET https://aiskillstore.io/v1/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/download

2

에이전트 플랫폼의 skills 디렉터리에 배치합니다.

cp chunking-toolkit.skill ./skills/

3

설치 가이드 API로 플랫폼별 상세 정보를 조회합니다.

GET https://aiskillstore.io/v1/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/install-guide?platform=CustomAgent

보안 검증 보고서

검증 결과 APPROVED

검사 결과: ["메타데이터 경고: 권장 필드 없음: 'requirements' (SKILL.md v2 권장)", "메타데이터 경고: 권장 필드 없음: 'changelog' (SKILL.md v2 권장)"]

✅ 보안 위험 항목이 발견되지 않았습니다.

AI 검수 단계

검수 주체 gemini 위험도 🟢 낮음 검수 요약 텍스트 청크 분할 및 관련 유틸리티 스킬로, 외부 의존성 없이 표준 라이브러리만 사용하며, 선언된 권한과 코드 내용이 일치하여 안전합니다.

판단 근거

1. **권한 일치 여부**: 스킬 메타데이터에 선언된 `permissions` (network: false, filesystem: false, subprocess: false)는 매우 제한적이며, 제공된 `main.py` 코드 스니펫 및 스킬 설명('Zero external dependencies -- standard library only')과 완벽하게 일치합니다. 외부 통신, 파일 시스템 접근, 서브프로세스 실행 등의 위험 요소가 없습니다. 2. **악의적 코드 여부**: 제공된 코드 스니펫은 언어 감지 및 토큰 추정 로직을 포함하며, 이는 스킬의 핵심 기능과 직접적으로 관련되어 있습니다. 데이터 탈취, 시스템 파괴, 난독화 등 악의적인 목적의 코드는 발견되지 않았습니다. 정적 분석 결과에서도 'red_flags_found', 'obfuscation_warnings', 'forbidden_exec_files_found' 모두 비어 있어 안전함을 뒷받침합니다. 3. **선언되지 않은 외부 통신**: `network: false`로 명시되어 있으며, 코드 스니펫에서도 외부 통신을 시도하는 어떠한 모듈(예: `requests`, `urllib`, `socket`)도 import하거나 사용하지 않습니다. 따라서 선언되지 않은 외부 통신은 없습니다. 4. **사용자 데이터 무단 수집/전송**: 스킬의 목적은 텍스트 처리이며, 입력된 텍스트를 처리하여 결과를 `stdout`으로 반환하는 방식입니다. 네트워크 권한이 없으므로 사용자 데이터를 외부로 무단 수집하거나 전송할 수 없습니다. 5. **코드 품질 및 목적 일치**: 코드 스니펫은 스킬의 설명된 기능(언어 감지, 토큰 계산)을 충실히 구현하고 있으며, 명확하고 표준적인 Python 코드로 작성되었습니다. 스킬 메타데이터의 `input_schema`, `output_schema`, `capabilities`, `examples` 또한 스킬의 목적과 기능을 명확하게 설명하고 있어 코드 품질과 목적이 잘 일치합니다. 종합적으로 판단할 때, 이 스킬은 보안 위험이 매우 낮으며, 선언된 기능과 권한 범위 내에서 안전하게 동작할 것으로 예상됩니다.

버전 히스토리

버전	USK v3	검증 상태	패키지 날짜	다운로드	변경사항
v1.0.0	✓	approved	2026-05-07	⬇ 2	—

사용 예시 (Examples) 7 개

이 스킬의 대표적인 입출력 예시입니다. 에이전트는 이 예시를 보고 스킬 호출 방법과 결과 형태를 이해할 수 있습니다.

chunk_sentence_korean

한국어 문장 경계 기반 청크 분할 / Korean sentence boundary chunking

📥 입력

{
  "action": "chunk",
  "chunk_size": 100,
  "lang": "ko",
  "overlap": 10,
  "strategy": "sentence_boundary",
  "text": "\uc778\uacf5\uc9c0\ub2a5 \uae30\uc220\uc774 \ube60\ub974\uac8c \ubc1c\uc804\ud558\uace0 \uc788\uc2b5\ub2c8\ub2e4. \ud2b9\ud788 \ub300\ud615 \uc5b8\uc5b4 \ubaa8\ub378\uc740 \ub9ce\uc740 \ubd84\uc57c\uc5d0\uc11c \ud601\uc2e0\uc744 \uc774\ub04c\uace0 \uc788\uc2b5\ub2c8\ub2e4. \uadf8\ub7ec\ub098 \ube44\uc6a9\uacfc \uc9c0\uc5f0 \uc2dc\uac04\uc774 \uacfc\uc81c\ub85c \ub0a8\uc544 \uc788\uc2b5\ub2c8\ub2e4."
}

📤 출력

{
  "action": "chunk",
  "chunks": [
    {
      "char_end": 55,
      "char_start": 0,
      "index": 0,
      "overlap_with_prev": 0,
      "text": "\uc778\uacf5\uc9c0\ub2a5 \uae30\uc220\uc774 \ube60\ub974\uac8c \ubc1c\uc804\ud558\uace0 \uc788\uc2b5\ub2c8\ub2e4. \ud2b9\ud788 \ub300\ud615 \uc5b8\uc5b4 \ubaa8\ub378\uc740 \ub9ce\uc740 \ubd84\uc57c\uc5d0\uc11c \ud601\uc2e0\uc744 \uc774\ub04c\uace0 \uc788\uc2b5\ub2c8\ub2e4.",
      "token_count": 45
    },
    {
      "char_end": 79,
      "char_start": 56,
      "index": 1,
      "overlap_with_prev": 10,
      "text": "\uadf8\ub7ec\ub098 \ube44\uc6a9\uacfc \uc9c0\uc5f0 \uc2dc\uac04\uc774 \uacfc\uc81c\ub85c \ub0a8\uc544 \uc788\uc2b5\ub2c8\ub2e4.",
      "token_count": 23
    }
  ],
  "meta": {
    "lang_detected": "ko",
    "strategy": "sentence_boundary",
    "total_chunks": 2
  }
}

chunk_fixed_size_english

고정 크기 청크 분할 (영어) / Fixed-size chunking for English text

📥 입력

{
  "action": "chunk",
  "chunk_size": 50,
  "lang": "en",
  "overlap": 10,
  "strategy": "fixed_size",
  "text": "Artificial intelligence is transforming industries. Machine learning models can analyze data at scale. Natural language processing enables human-computer interaction."
}

📤 출력

{
  "action": "chunk",
  "chunks": [
    {
      "char_end": 100,
      "char_start": 0,
      "index": 0,
      "overlap_with_prev": 0,
      "text": "Artificial intelligence is transforming industries. Machine learning models can analyze data at scale.",
      "token_count": 19
    },
    {
      "char_end": 162,
      "char_start": 101,
      "index": 1,
      "overlap_with_prev": 10,
      "text": "Natural language processing enables human-computer interaction.",
      "token_count": 9
    }
  ],
  "meta": {
    "lang_detected": "en",
    "strategy": "fixed_size",
    "total_chunks": 2
  }
}

detect_boundaries_mixed

문장 경계 감지 — 한영 혼합 텍스트 / Detect sentence boundaries in mixed Korean-English text

📥 입력

{
  "action": "detect_boundaries",
  "lang": "auto",
  "text": "Hello world. \uc548\ub155\ud558\uc138\uc694! This is a test. \uc88b\uc740 \ud558\ub8e8 \ub418\uc138\uc694."
}

📤 출력

{
  "action": "detect_boundaries",
  "boundaries": [
    {
      "confidence": "high",
      "position": 12,
      "type": "sentence_end"
    },
    {
      "confidence": "high",
      "position": 22,
      "type": "sentence_end"
    },
    {
      "confidence": "high",
      "position": 37,
      "type": "sentence_end"
    }
  ],
  "meta": {
    "lang_detected": "mixed",
    "total_boundaries": 3
  }
}

count_tokens_context_check

텍스트 토큰 수 계산 및 컨텍스트 적합성 확인 / Count tokens and check if text fits LLM context

📥 입력

{
  "action": "count_tokens",
  "lang": "en",
  "model_context": 4096,
  "text": "This is a sample document for token counting. It contains multiple sentences."
}

📤 출력

{
  "action": "count_tokens",
  "meta": {
    "size_unit": "tokens"
  },
  "token_info": {
    "context_window": 4096,
    "estimated_chunks": 1,
    "fits_in_context": true,
    "lang_detected": "en",
    "lang_multiplier": 1.0,
    "total_tokens": 16
  }
}

merge_chunks_basic

작은 청크들을 병합하여 최적 크기로 조정 / Merge small chunks into optimal size

📥 입력

{
  "action": "merge_chunks",
  "chunks": [
    {
      "index": 0,
      "text": "Hello.",
      "token_count": 2
    },
    {
      "index": 1,
      "text": "World!",
      "token_count": 2
    },
    {
      "index": 2,
      "text": "How are you today?",
      "token_count": 5
    }
  ],
  "merge_max_size": 20
}

📤 출력

{
  "action": "merge_chunks",
  "chunks": [
    {
      "char_end": 31,
      "char_start": 0,
      "index": 0,
      "overlap_with_prev": 0,
      "text": "Hello. World! How are you today?",
      "token_count": 9
    }
  ],
  "meta": {
    "merged_count": 1,
    "original_count": 3
  }
}

audit_chunk_quality

청크 품질 감사 — 너무 크거나 작은 청크 감지 / Audit chunk quality for oversized and undersized chunks

📥 입력

{
  "action": "audit_chunk_quality",
  "chunks": [
    {
      "index": 0,
      "text": "Hi.",
      "token_count": 2
    },
    {
      "index": 1,
      "text": "This is a very long chunk that contains a lot of information and exceeds the recommended size for embedding models which typically work best with chunks of 256-512 tokens.",
      "token_count": 800
    },
    {
      "index": 2,
      "text": "Normal chunk with moderate content suitable for embedding.",
      "token_count": 120
    }
  ],
  "model_context": 512
}

📤 출력

{
  "action": "audit_chunk_quality",
  "meta": {
    "model_context": 512
  },
  "quality_report": {
    "issues": [
      {
        "chunk_index": 0,
        "fix_hint": {
          "action": "merge_chunks",
          "field": "chunks[0]",
          "reference": "https://aiskillstore.io/skills/chunking-toolkit",
          "suggested_replacement": "Use merge_chunks action to combine with adjacent chunk"
        },
        "issue": "undersized",
        "message": "Chunk 0 is very small (2 tokens). Consider merging with adjacent chunk.",
        "token_count": 2
      },
      {
        "chunk_index": 1,
        "fix_hint": {
          "action": "re_chunk",
          "field": "chunks[1]",
          "reference": "https://aiskillstore.io/skills/chunking-toolkit",
          "suggested_replacement": "Re-chunk with chunk_size \u003c= 512"
        },
        "issue": "oversized",
        "message": "Chunk 1 exceeds model context limit (800 \u003e 512 tokens).",
        "token_count": 800
      }
    ],
    "score": 55,
    "stats": {
      "avg_tokens": 307.3,
      "max_tokens": 800,
      "min_tokens": 2,
      "oversized_count": 1,
      "undersized_count": 1
    },
    "total_chunks": 3
  }
}

chunk_sliding_window

슬라이딩 윈도우 청크 분할 / Sliding window chunking for dense retrieval

📥 입력

{
  "action": "chunk",
  "chunk_size": 30,
  "lang": "en",
  "overlap": 15,
  "strategy": "sliding_window",
  "text": "The quick brown fox jumps over the lazy dog. Pack my box with five dozen liquor jugs."
}

📤 출력

{
  "action": "chunk",
  "chunks": [
    {
      "char_end": 43,
      "char_start": 0,
      "index": 0,
      "overlap_with_prev": 0,
      "text": "The quick brown fox jumps over the lazy dog.",
      "token_count": 9
    },
    {
      "char_end": 86,
      "char_start": 24,
      "index": 1,
      "overlap_with_prev": 15,
      "text": "jumps over the lazy dog. Pack my box with five dozen liquor jugs.",
      "token_count": 13
    }
  ],
  "meta": {
    "strategy": "sliding_window",
    "total_chunks": 2
  }
}

모든 예시는 에이전트 API로도 조회 가능: /v1/agent/skills/d7a00f2d-c8c2-48fb-88c9-a3ca6b42cb27/schema

리뷰 & 평점

아직 리뷰가 없습니다. 첫 번째 리뷰를 남겨보세요!

✍️ 리뷰 작성

닉네임 * 별점 * 코멘트 (선택)