Ceph存储监控故障排除步骤

Ceph存储监控故障排除步骤

Ceph存储监控:

默认情况下,通过SSH添加Ceph监视器。所以如果客户想要监控服务器的Ceph存储,SSH必然是正常工作的状态。

支持的Ceph存储版本:>= v0.66

用于收集所有性能数据的命令:ceph -s -f json-pretty

对于任何数据收集的问题或身份验证的问题

首先让客户运行ServerSSHTroubleshoot.sh/bat,检查SSH连接是否正常。如果SSH工作正常,则在远程机器上运行以下命令并检查输出,另外输出应为以下格式,否则您会在UI上看到错误消息“未收集性能数据,因为Ceph状态命令不返回输出”

ceph -s -f json-pretty

示例输出:

{ "health": { "health": { "health_services": [
                { "mons": [
                        { "name": "deis-cc4a9dd1-ed85-4958-9fcd-51444b43cfb3.novalocal",
                          "kb_total": 39514092,
                          "kb_used": 11348408,
                          "kb_avail": 27876392,
                          "avail_percent": 70,
                          "last_updated": "2014-11-15 21:06:50.920246",
                          "store_stats": { "bytes_total": 28439194,
                              "bytes_sst": 0,
                              "bytes_log": 1592115,
                              "bytes_misc": 26847079,
                              "last_updated": "0.000000"},
                          "health": "HEALTH_OK"},
                        { "name": "deis-a4a81c52-ac05-44d1-880c-5949c4777ba3.novalocal",
                          "kb_total": 39514092,
                          "kb_used": 12296928,
                          "kb_avail": 26675680,
                          "avail_percent": 67,
                          "last_updated": "2014-11-15 21:06:51.557871",
                          "store_stats": { "bytes_total": 28970803,
                              "bytes_sst": 0,
                              "bytes_log": 2381808,
                              "bytes_misc": 26588995,
                              "last_updated": "0.000000"},
                          "health": "HEALTH_OK"},
                        { "name": "deis-0b8f4cec-a9f3-4ce8-9b1b-b2a9a17ff2dd.novalocal",
                          "kb_total": 39514092,
                          "kb_used": 12030260,
                          "kb_avail": 27192460,
                          "avail_percent": 68,
                          "last_updated": "2014-11-15 21:06:44.210473",
                          "store_stats": { "bytes_total": 29148495,
                              "bytes_sst": 0,
                              "bytes_log": 2295888,
                              "bytes_misc": 26852607,
                              "last_updated": "0.000000"},
                          "health": "HEALTH_OK"}]}]},
      "summary": [
            { "severity": "HEALTH_WARN",
              "summary": "1536 pgs degraded"},
            { "severity": "HEALTH_WARN",
              "summary": "1536 pgs stuck degraded"},
            { "severity": "HEALTH_WARN",
              "summary": "1536 pgs stuck unclean"},
            { "severity": "HEALTH_WARN",
              "summary": "1536 pgs stuck undersized"},
            { "severity": "HEALTH_WARN",
              "summary": "1536 pgs undersized"},
            { "severity": "HEALTH_WARN",
              "summary": "recovery 634\/1896 objects degraded (33.439%)"}],
      "timechecks": { "epoch": 252,
          "round": 4,
          "round_status": "finished",
          "mons": [
                { "name": "deis-cc4a9dd1-ed85-4958-9fcd-51444b43cfb3.novalocal",
                  "skew": "0.000000",
                  "latency": "0.000000",
                  "health": "HEALTH_OK"},
                { "name": "deis-a4a81c52-ac05-44d1-880c-5949c4777ba3.novalocal",
                  "skew": "0.000812",
                  "latency": "0.273973",
                  "health": "HEALTH_OK"},
                { "name": "deis-0b8f4cec-a9f3-4ce8-9b1b-b2a9a17ff2dd.novalocal",
                  "skew": "0.000000",
                  "latency": "1.771649",
                  "health": "HEALTH_OK"}]},
      "overall_status": "HEALTH_WARN",
      "detail": []},
  "fsid": "7500c071-e3ee-4c5b-abb4-ddbd02759a46",
  "election_epoch": 252,
  "quorum": [
        0,
        1,
        2],
  "quorum_names": [
        "deis-cc4a9dd1-ed85-4958-9fcd-51444b43cfb3.novalocal",
        "deis-a4a81c52-ac05-44d1-880c-5949c4777ba3.novalocal",
        "deis-0b8f4cec-a9f3-4ce8-9b1b-b2a9a17ff2dd.novalocal"],
  "monmap": { "epoch": 3,
      "fsid": "7500c071-e3ee-4c5b-abb4-ddbd02759a46",
      "modified": "2014-11-14 13:42:35.275754",
      "created": "2014-11-14 13:41:52.048963",
      "mons": [
            { "rank": 0,
              "name": "deis-cc4a9dd1-ed85-4958-9fcd-51444b43cfb3.novalocal",
              "addr": "10.21.12.27:6789\/0"},
            { "rank": 1,
              "name": "deis-a4a81c52-ac05-44d1-880c-5949c4777ba3.novalocal",
              "addr": "10.21.12.28:6789\/0"},
            { "rank": 2,
              "name": "deis-0b8f4cec-a9f3-4ce8-9b1b-b2a9a17ff2dd.novalocal",
              "addr": "10.21.12.29:6789\/0"}]},
  "osdmap": { "osdmap": { "epoch": 88,
          "num_osds": 5,
          "num_up_osds": 2,
          "num_in_osds": 2,
          "full": false,
          "nearfull": false}},
  "pgmap": { "pgs_by_state": [
            { "state_name": "active+undersized+degraded",
              "count": 1536}],
      "version": 11577,
      "num_pgs": 1536,
      "data_bytes": 773294495,
      "bytes_used": 23943340032,
      "bytes_avail": 56388460544,
      "bytes_total": 80924860416,
      "degraded_objects": 634,
      "degraded_total": 1896,
      "degraded_ratio": "33.439",
      "read_bytes_sec": 24849,
      "write_bytes_sec": 2935,
      "op_per_sec": 8},
  "mdsmap": { "epoch": 50,
      "up": 1,
      "in": 1,
      "max": 1,
      "by_rank": [
            { "rank": 0,
              "name": "deis-cc4a9dd1-ed85-4958-9fcd-51444b43cfb3.novalocal",
              "status": "up:active"}],
"up:standby": 2}}

    • Related Articles

    • IPSLA 故障排除帮助文档

      本文档涵盖基本故障排除和配置详细信息。它还有助于查找和解决与配置和数据收集有关的问题 涵盖的主题: 即使使用IP SLA和Cisco RTTMON-MIB正确启动了源设备,监视器配置也会失败? 即使已添加设备,IP SLA监视器配置也不会在添加监视器页面中显示设备? 添加监视器失败。无法更改活动监视器。尝试再次创建自己的监视器? 为什么阈值参数设置更新未反映在现有的IP SLA监视器中? 有哪些阈值参数可用? 为什么我收到 “源路由器SNMP写社区可能是错误的” 消息? ...
    • AWS EKS故障排除常见问题

      只有当您已在Applications Manager中添加了Amazon监控器时,才能配置AWS EKS监控器。 在Applications Manager中监控Elastic Kubernetes Service之前,请确保您已满足Amazon监控的先决条件。 如何验证是否满足所有先决条件? 要检查是否安装了aws-cli,请执行以下命令:aws --version 示例输出 (对于Windows 10):aws-cli/2.1.29 Python/3.8.8 Windows/10 ...
    • Azure SQL数据库 - 故障排除 - 将客户端IP添加到Azure防火墙

      "与数据库的连接失败并出现错误:无法打开登录请求的服务器 <ServerName>。IP 地址为 <SomeIP> 的客户端不允许访问服务器。要启用访问,请使用Windows Azure管理门户或在主数据库上运行sp_set_firewall_rule来为此IP地址或地址范围创建防火墙规则” - Azure SQL数据库的监视器详细信息页面中的错误消息“ 故障排除 在Azure门户中,用户应在SQL服务器的防火墙设置下添加客户端IP ...
    • Microsoft Azure - 故障排除 - AD Application和服务主体模式下的应用程序密钥无效错误

      “订阅者 <subscriptionID> 的Azure服务发现失败。无效的应用程序密钥” - 使用AD Application和服务主体模式编辑新Azure监视器时出现错误消息 故障排除: 在 AD应用程序和服务主体模式下,用户在Azure门户中创建一个AD应用程序,并在Applications Manager中提供生成密钥客户端ID、TenantID和应用程序密钥的门户。创建1年后,Application密钥将会过期并抛出上述错误消息。  ​​​ ...
    • 拓扑图故障排除

      确保在设置 ->系统设置 ->常规下启动了数据库查询。 启动后,点击右上角的支持图标 ->数据库查询,依次执行以下查询,确保所有页面均位于页面底部中心,并将输出保存为 .csv格式,然后将其发送给OpManager支持。 从LAYER2CONFIGURATION表中获取拓扑图ID(从LAYER2CONFIGURATION中选择拓扑图ID,其中拓扑图名称=<'MAPNAME'>) 从LAYER2CONFIGURATION中选择* ...