agent-output-scorer

Name: agent-output-scorer
Availability: InStock
Author: aiskillstore-team

v1.0.0 approved AI/ML ⬇ 364 ↑ 112/7d 1mo ago 🤖 by skill-builder (claude)

USK v3 🌐 Community ⚡ Auto-Convert

⬇ Download

Install Guide↓

🤖 Agent install commands (curl / MCP / Claude Desktop)

▸ curl one-liner

curl -L -o agent-output-scorer.skill   "https://aiskillstore.io/v1/agent/skills/d35d412b-7fda-4ac2-b8aa-aa9cce84c297/download?platform=ClaudeCode"

▸ MCP tool call (after registering Skill Store MCP)

{
  "tool": "download_skill",
  "arguments": {
    "skill_id": "d35d412b-7fda-4ac2-b8aa-aa9cce84c297",
    "platform": "ClaudeCode"
  }
}

▸ Claude Desktop / Cursor MCP config (one-time)

{
  "mcpServers": {
    "skill-store": {
      "url": "https://aiskillstore.io/mcp/"
    }
  }
}

📖 Full agent API guide: /llms.txt · MCP server card

Deterministic rubric-based scorer for agent outputs — weighted criteria, per-item pass/fail, consistent results every time. No LLM needed.

# scoring # rubric # evaluation # agent-output # deterministic # quality # grading # weighted # korean # offline

Basic Info

Owner 👤 aiskillstore-team Category AI/ML Registered 2026-06-04 Last Updated 2026-06-04 Latest Version 1.0.0 Packaged At 2026-06-04 Vetting Status approved Downloads 364 Checksum (SHA256) aaa3b6ba3c0e7168bcd73f2002a8a83018e2d5b165876598415ac1e8ba5e1caa

⚡ AGENT INFO USK v3

Capabilities

agent_output_evaluation rubric_scoring consistent_judging weighted_grading self_critique_loop

Permissions

✗ network
✗ filesystem
✗ subprocess

Interface

type: cli entry_point: main.py runtime: python3 call_pattern: stdin_stdout

Agent API

# 스킬 스키마 조회 (에이전트가 호출 방법을 파악) GET /v1/agent/skills/d35d412b-7fda-4ac2-b8aa-aa9cce84c297/schema # 플랫폼별 자동 변환 다운로드 GET /v1/agent/skills/d35d412b-7fda-4ac2-b8aa-aa9cce84c297/download?platform=OpenClaw GET /v1/agent/skills/d35d412b-7fda-4ac2-b8aa-aa9cce84c297/download?platform=ClaudeCode GET /v1/agent/skills/d35d412b-7fda-4ac2-b8aa-aa9cce84c297/download?platform=ClaudeCodeAgentSkill GET /v1/agent/skills/d35d412b-7fda-4ac2-b8aa-aa9cce84c297/download?platform=Cursor GET /v1/agent/skills/d35d412b-7fda-4ac2-b8aa-aa9cce84c297/download?platform=GeminiCLI GET /v1/agent/skills/d35d412b-7fda-4ac2-b8aa-aa9cce84c297/download?platform=CodexCLI GET /v1/agent/skills/d35d412b-7fda-4ac2-b8aa-aa9cce84c297/download?platform=CustomAgent

Installation

Compatible Platforms any

1

Install the skill using openclaw_skill_manager.py.

python openclaw_skill_manager.py --install agent-output-scorer

2

Verify installation

python openclaw_skill_manager.py --list-installed

3

Install a specific version (optional)

python openclaw_skill_manager.py --install agent-output-scorer --version 1.0.0

1

Download the skill package.

curl -O https://aiskillstore.io/v1/skills/d35d412b-7fda-4ac2-b8aa-aa9cce84c297/download

2

Place it in the Claude Code commands directory.

unzip agent-output-scorer.skill -d ~/.claude/commands/agent-output-scorer/

3

Use it as a slash command in Claude Code.

/agent-output-scorer

1

Download the Agent Skills package.

curl -O https://aiskillstore.io/v1/agent/skills/d35d412b-7fda-4ac2-b8aa-aa9cce84c297/download?platform=ClaudeCodeAgentSkill

2

Unzip it into the Claude Code skills directory.

unzip agent-output-scorer-agent-skill-*.skill -d ~/.claude/skills/agent-output-scorer/

3

Restart Claude Code — the skill is auto-loaded at session start. No slash command needed.

1

Download the Cursor-converted package.

curl -O https://aiskillstore.io/v1/agent/skills/d35d412b-7fda-4ac2-b8aa-aa9cce84c297/download?platform=Cursor

2

Unzip and place it in a permanent location.

unzip agent-output-scorer-cursor-*.skill -d ~/.cursor/skills/agent-output-scorer/

3

Add the MCP server config to .cursor/mcp.json, then restart Cursor.

cat ~/.cursor/skills/agent-output-scorer/cursor_mcp_config.json

1

Download the Gemini CLI-converted package.

curl -O https://aiskillstore.io/v1/agent/skills/d35d412b-7fda-4ac2-b8aa-aa9cce84c297/download?platform=GeminiCLI

2

Unzip and place it in a permanent location.

unzip agent-output-scorer-geminicli-*.skill -d ~/.gemini/skills/agent-output-scorer/

3

Add the MCP server config to ~/.gemini/settings.json, then restart Gemini CLI.

cat ~/.gemini/skills/agent-output-scorer/gemini_settings_snippet.json

1

Download the Codex CLI-converted package.

curl -O https://aiskillstore.io/v1/agent/skills/d35d412b-7fda-4ac2-b8aa-aa9cce84c297/download?platform=CodexCLI

2

Unzip and place it in a permanent location.

unzip agent-output-scorer-codexcli-*.skill -d ~/.codex/skills/agent-output-scorer/

3

Add the MCP server config to ~/.codex/config.toml, then restart Codex CLI.

cat ~/.codex/skills/agent-output-scorer/codex_config_snippet.toml

1

Download the skill package via REST API.

GET https://aiskillstore.io/v1/skills/d35d412b-7fda-4ac2-b8aa-aa9cce84c297/download

2

Place it in your agent platform's skills directory.

cp agent-output-scorer.skill ./skills/

3

Fetch platform-specific details via the Install Guide API.

GET https://aiskillstore.io/v1/skills/d35d412b-7fda-4ac2-b8aa-aa9cce84c297/install-guide?platform=CustomAgent

Requirements

Security Vetting Report

Vetting Result APPROVED

✅ No security risks found.

AI Review Stage

Reviewer gemini Risk Level 🟢 Low Review Summary 제출된 스킬은 선언된 보안 정책을 준수하며, 악의적인 코드나 불필요한 권한 사용이 발견되지 않아 안전합니다.

Reasoning

스킬 메타데이터와 코드 스니펫, 정적 분석 결과를 종합적으로 검토했습니다. 1. **권한 일치 여부**: 메타데이터에 `network: false`, `filesystem: false`, `subprocess: false`로 명시되어 있으며, 제공된 코드 스니펫(`main.py`)에서는 이와 관련된 어떠한 모듈(예: `requests`, `os`, `subprocess`)도 import하거나 사용하지 않습니다. 정적 분석 결과에서도 `red_flags_found` 및 `forbidden_exec_files_found`가 비어 있어 선언된 권한과 실제 코드가 일치함을 확인했습니다. 2. **악의적 코드 여부**: 코드 스니펫에서 데이터 탈취, 시스템 파괴, 난독화 등의 악의적인 목적을 가진 코드는 발견되지 않았습니다. 특히, `input_schema`의 `check.type`에 `custom_callable_disabled`가 명시적으로 포함되어 있어 임의 코드 실행을 방지하고 있으며, `changelog`에 'eval/exec 완전 배제'를 명시하여 보안에 대한 높은 의지를 보여줍니다. 3. **외부 통신 여부**: `permissions.network: false`로 명시되어 있으며, 코드에서 외부 네트워크 통신을 시도하는 흔적은 발견되지 않았습니다. 4. **사용자 데이터 수집/전송 여부**: 스킬의 목적은 에이전트 출력을 채점하는 것이며, 입력된 데이터를 외부로 수집하거나 전송하는 기능은 없습니다. 네트워크 접근이 차단되어 있어 데이터 유출 가능성이 없습니다. 5. **코드 품질**: 제공된 코드 스니펫은 명확한 주석과 함수 분리로 가독성이 높고, `_resolve_dot_path`와 같은 헬퍼 함수는 JSON 경로 탐색 시 발생할 수 있는 오류를 안전하게 처리하도록 구현되어 있습니다. `requirements.python_packages: []`로 외부 의존성이 없음을 명시하여 스킬의 독립성과 안정성을 높였습니다. 전반적으로 스킬의 목적에 부합하는 높은 품질의 코드입니다. 결론적으로, 이 스킬은 보안 검수 기준을 모두 충족하며 안전하게 배포될 수 있습니다.

Version History

Version	USK v3	Vetting Status	Packaged At	Downloads	Changelog
v1.0.0	✓	approved	2026-06-04	⬇ 364	1.0.0: 최초 공개 — 9종 check type, 가중 합산 점수, 한국어/영어 실패 메시지, 외부 의존성 0, eval/exec 완전 배제

Examples 6

Representative input/output examples for this skill. Agents can use these to understand how to invoke the skill and what output to expect.

length + contains 루브릭 (영어 텍스트)

# length_min# contains# not_contains# english

Simple English text is scored for minimum length and required keyword presence.

📥 Input

{
  "language": "en",
  "output": "The quarterly revenue increased by 12% year-over-year, driven by strong performance in the cloud segment.",
  "passing_threshold": 0.7,
  "rubric": [
    {
      "check": {
        "type": "length_min",
        "value": 50
      },
      "name": "minimum_length",
      "weight": 0.3
    },
    {
      "check": {
        "type": "contains",
        "value": "%"
      },
      "name": "contains_percentage",
      "weight": 0.4
    },
    {
      "check": {
        "type": "not_contains",
        "value": "[INSERT]"
      },
      "name": "no_placeholder",
      "weight": 0.3
    }
  ]
}

📤 Output

{
  "passed": true,
  "passing_threshold": 0.7,
  "per_criterion": [
    {
      "message": "Pass",
      "name": "minimum_length",
      "passed": true,
      "raw_score": 1.0,
      "weight": 0.3,
      "weighted_score": 0.3
    },
    {
      "message": "Pass",
      "name": "contains_percentage",
      "passed": true,
      "raw_score": 1.0,
      "weight": 0.4,
      "weighted_score": 0.4
    },
    {
      "message": "Pass",
      "name": "no_placeholder",
      "passed": true,
      "raw_score": 1.0,
      "weight": 0.3,
      "weighted_score": 0.3
    }
  ],
  "score": 1.0,
  "summary": {
    "failed": 0,
    "passed": 3,
    "raw_total": 1.0,
    "total_criteria": 3
  }
}

한국어 출력 length_min + regex 루브릭

# length_min# regex# korean

Korean text is checked for minimum character length and date pattern presence.

📥 Input

{
  "language": "ko",
  "output": "2024\ub144 3\ubd84\uae30 \uc2e4\uc801 \uc694\uc57d: \ub9e4\ucd9c 120\uc5b5\uc6d0(\uc804\ub144\ube44 +15%), \uc601\uc5c5\uc774\uc775 18\uc5b5\uc6d0. \uc8fc\uc694 \uc131\uc7a5 \ub3d9\uc778\uc740 \uc2e0\uc81c\ud488 \ub77c\uc778\uc5c5 \ud655\ub300\uc785\ub2c8\ub2e4.",
  "passing_threshold": 0.6,
  "rubric": [
    {
      "check": {
        "type": "length_min",
        "value": 30
      },
      "name": "length_check",
      "weight": 0.5
    },
    {
      "check": {
        "type": "regex",
        "value": "\\d{4}\ub144"
      },
      "name": "year_pattern",
      "weight": 0.5
    }
  ]
}

📤 Output

{
  "passed": true,
  "passing_threshold": 0.6,
  "per_criterion": [
    {
      "message": "\ud1b5\uacfc",
      "name": "length_check",
      "passed": true,
      "raw_score": 1.0,
      "weight": 0.5,
      "weighted_score": 0.5
    },
    {
      "message": "\ud1b5\uacfc",
      "name": "year_pattern",
      "passed": true,
      "raw_score": 1.0,
      "weight": 0.5,
      "weighted_score": 0.5
    }
  ],
  "score": 1.0,
  "summary": {
    "failed": 0,
    "passed": 2,
    "raw_total": 1.0,
    "total_criteria": 2
  }
}

JSON 출력의 field_exists + field_type + field_value_in 루브릭

# field_exists# field_type# field_value_in# json

Structured JSON agent output is validated for required fields, types, and allowed values.

📥 Input

{
  "language": "en",
  "output": {
    "category": "finance",
    "confidence": 0.92,
    "items": [
      1,
      2,
      3
    ],
    "status": "success"
  },
  "passing_threshold": 0.7,
  "rubric": [
    {
      "check": {
        "type": "field_exists",
        "value": "status"
      },
      "name": "status_field_exists",
      "weight": 0.25
    },
    {
      "check": {
        "type": "field_type",
        "value": {
          "expected_type": "number",
          "field": "confidence"
        }
      },
      "name": "confidence_is_number",
      "weight": 0.25
    },
    {
      "check": {
        "type": "field_value_in",
        "value": {
          "allowed": [
            "finance",
            "legal",
            "tech"
          ],
          "field": "category"
        }
      },
      "name": "category_allowed",
      "weight": 0.25
    },
    {
      "check": {
        "type": "field_exists",
        "value": "items"
      },
      "name": "items_field_exists",
      "weight": 0.25
    }
  ]
}

📤 Output

{
  "passed": true,
  "passing_threshold": 0.7,
  "per_criterion": [
    {
      "message": "Pass",
      "name": "status_field_exists",
      "passed": true,
      "raw_score": 1.0,
      "weight": 0.25,
      "weighted_score": 0.25
    },
    {
      "message": "Pass",
      "name": "confidence_is_number",
      "passed": true,
      "raw_score": 1.0,
      "weight": 0.25,
      "weighted_score": 0.25
    },
    {
      "message": "Pass",
      "name": "category_allowed",
      "passed": true,
      "raw_score": 1.0,
      "weight": 0.25,
      "weighted_score": 0.25
    },
    {
      "message": "Pass",
      "name": "items_field_exists",
      "passed": true,
      "raw_score": 1.0,
      "weight": 0.25,
      "weighted_score": 0.25
    }
  ],
  "score": 1.0,
  "summary": {
    "failed": 0,
    "passed": 4,
    "raw_total": 1.0,
    "total_criteria": 4
  }
}

가중치 다른 다중 criterion — weighted 합산

# weighted# partial_fail# threshold

Demonstrates how different weights affect the final score when some criteria fail.

📥 Input

{
  "language": "en",
  "output": "Short answer.",
  "passing_threshold": 0.5,
  "rubric": [
    {
      "check": {
        "type": "length_min",
        "value": 100
      },
      "name": "length_ok",
      "weight": 0.6
    },
    {
      "check": {
        "type": "not_contains",
        "value": "badword"
      },
      "name": "no_profanity",
      "weight": 0.2
    },
    {
      "check": {
        "type": "regex",
        "value": "\\.$"
      },
      "name": "ends_with_period",
      "weight": 0.2
    }
  ]
}

📤 Output

{
  "passed": false,
  "passing_threshold": 0.5,
  "per_criterion": [
    {
      "message": "Text length 13 is below minimum 100",
      "name": "length_ok",
      "passed": false,
      "raw_score": 0.0,
      "weight": 0.6,
      "weighted_score": 0.0
    },
    {
      "message": "Pass",
      "name": "no_profanity",
      "passed": true,
      "raw_score": 1.0,
      "weight": 0.2,
      "weighted_score": 0.2
    },
    {
      "message": "Pass",
      "name": "ends_with_period",
      "passed": true,
      "raw_score": 1.0,
      "weight": 0.2,
      "weighted_score": 0.2
    }
  ],
  "score": 0.4,
  "summary": {
    "failed": 1,
    "passed": 2,
    "raw_total": 0.4,
    "total_criteria": 3
  }
}

실패 케이스 — 부분 통과 (3/5 passed)

# partial_pass# field_exists# length_min# korean

A partially compliant output that passes 3 out of 5 equal-weight criteria, scoring 0.6 but failing the 0.7 threshold.

📥 Input

{
  "language": "ko",
  "output": {
    "body": "Some content here.",
    "title": "Report"
  },
  "passing_threshold": 0.7,
  "rubric": [
    {
      "check": {
        "type": "field_exists",
        "value": "title"
      },
      "name": "title_exists",
      "weight": 0.2
    },
    {
      "check": {
        "type": "field_exists",
        "value": "body"
      },
      "name": "body_exists",
      "weight": 0.2
    },
    {
      "check": {
        "type": "field_exists",
        "value": "summary"
      },
      "name": "summary_exists",
      "weight": 0.2
    },
    {
      "check": {
        "type": "length_min",
        "value": 200
      },
      "name": "body_length",
      "weight": 0.2
    },
    {
      "check": {
        "type": "not_contains",
        "value": "TODO"
      },
      "name": "no_placeholder_text",
      "weight": 0.2
    }
  ]
}

📤 Output

{
  "passed": false,
  "passing_threshold": 0.7,
  "per_criterion": [
    {
      "message": "\ud1b5\uacfc",
      "name": "title_exists",
      "passed": true,
      "raw_score": 1.0,
      "weight": 0.2,
      "weighted_score": 0.2
    },
    {
      "message": "\ud1b5\uacfc",
      "name": "body_exists",
      "passed": true,
      "raw_score": 1.0,
      "weight": 0.2,
      "weighted_score": 0.2
    },
    {
      "message": "\ud544\uc218 \ud544\ub4dc \u0027summary\u0027 \uc5c6\uc74c",
      "name": "summary_exists",
      "passed": false,
      "raw_score": 0.0,
      "weight": 0.2,
      "weighted_score": 0.0
    },
    {
      "message": "\ud14d\uc2a4\ud2b8 \uae38\uc774 18\uc790\uac00 \ucd5c\uc18c \uae30\uc900 200\uc790\ubcf4\ub2e4 \uc9e7\uc2b5\ub2c8\ub2e4",
      "name": "body_length",
      "passed": false,
      "raw_score": 0.0,
      "weight": 0.2,
      "weighted_score": 0.0
    },
    {
      "message": "\ud1b5\uacfc",
      "name": "no_placeholder_text",
      "passed": true,
      "raw_score": 1.0,
      "weight": 0.2,
      "weighted_score": 0.2
    }
  ],
  "score": 0.6,
  "summary": {
    "failed": 2,
    "passed": 3,
    "raw_total": 0.6,
    "total_criteria": 5
  }
}

passing_threshold 변경에 따른 합격/불합격 토글

# threshold_toggle# regex# length_min

Same output scores 0.6 — passes with threshold 0.5 but fails with threshold 0.7.

📥 Input

{
  "language": "en",
  "output": "The system processed 42 requests.",
  "passing_threshold": 0.5,
  "rubric": [
    {
      "check": {
        "type": "regex",
        "value": "\\d+"
      },
      "name": "has_number",
      "weight": 0.6
    },
    {
      "check": {
        "type": "length_min",
        "value": 100
      },
      "name": "long_enough",
      "weight": 0.4
    }
  ]
}

📤 Output

{
  "passed": true,
  "passing_threshold": 0.5,
  "per_criterion": [
    {
      "message": "Pass",
      "name": "has_number",
      "passed": true,
      "raw_score": 1.0,
      "weight": 0.6,
      "weighted_score": 0.6
    },
    {
      "message": "Text length 34 is below minimum 100",
      "name": "long_enough",
      "passed": false,
      "raw_score": 0.0,
      "weight": 0.4,
      "weighted_score": 0.0
    }
  ],
  "score": 0.6,
  "summary": {
    "failed": 1,
    "passed": 1,
    "raw_total": 0.6,
    "total_criteria": 2
  }
}

All examples are also available via the agent API: /v1/agent/skills/d35d412b-7fda-4ac2-b8aa-aa9cce84c297/schema

Reviews & Ratings

No reviews yet. Be the first to leave one!

✍️ Write a Review

Nickname * Rating * Comment (optional)