choerodon-cluster-agent 报错

  • Choerodon平台版本:0.17.0

  • 运行环境:自主搭建

  • 问题描述:

E0704 09:16:54.883528       6 client.go:90] dial error ws://devops.xxx/agent/?version=0.17.0&clusterId=3&token=fd15446f-3512-416e-aa21-e5fd6e3b8111&key=cluster:3: websocket: bad handshake
E0704 09:16:59.993084       6 client.go:90] dial error ws://devops.xxx/agent/?version=0.17.0&clusterId=3&token=fd15446f-3512-416e-aa21-e5fd6e3b8111&key=cluster:3: websocket: bad handshake
W0704 09:17:01.787505       6 reflector.go:341] github.com/choerodon/choerodon-cluster-agent/vendor/k8s.io/client-go/informers/factory.go:86: watch of *v1.Event ended with: The resourceVersion for the provided watch is too old.

截图看下devops-service日志里面报什么错

你们这截图 老说 图片过大 。。。实际才几十K

2019-07-04 09:45:58.655  INFO 7 --- [nio-8060-exec-5] i.c.w.session.AgentSessionListener       : agent session close 28be2057a01540cebe021ce429929a46
the count of executor :2
2019-07-04 09:45:58.655 ERROR 7 --- [nio-8060-exec-5] io.choerodon.websocket.SocketSender      : session28be2057a01540cebe021ce429929a46 disconnected when send msg
2019-07-04 09:45:58.656 ERROR 7 --- [nio-8060-exec-5] i.c.websocket.process.ProcessManager     : dispatch error
2019-07-04 09:45:58.657 ERROR 7 --- [nio-8060-exec-5] i.c.websocket.process.ProcessManager     : process msg error

java.lang.NullPointerException: null
	at io.choerodon.devops.app.service.impl.DeployMsgHandlerServiceImpl.commandNotSend(DeployMsgHandlerServiceImpl.java:738)
	at io.choerodon.devops.app.service.impl.DeployMsgHandlerServiceImpl$$FastClassBySpringCGLIB$$b2587643.invoke(<generated>)
	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:684)
	at io.choerodon.devops.app.service.impl.DeployMsgHandlerServiceImpl$$EnhancerBySpringCGLIB$$6a0ba67c.commandNotSend(<generated>)
	at io.choerodon.devops.api.eventhandler.SocketMessageHandler.process(SocketMessageHandler.java:133)
	at io.choerodon.websocket.process.ProcessManager.processInter(ProcessManager.java:88)
	at io.choerodon.websocket.process.ProcessManager.process(ProcessManager.java:78)
	at io.choerodon.websocket.listener.SimpleMsgListener.onMsg(SimpleMsgListener.java:18)
	at io.choerodon.websocket.listener.MsgListenerDecorator.onMsg(MsgListenerDecorator.java:14)
	at io.choerodon.websocket.listener.AgentCommandListener.onMsg(AgentCommandListener.java:16)
	at io.choerodon.websocket.helper.CommandSender.sendMsg(CommandSender.java:21)
	at io.choerodon.devops.domain.service.impl.DeployServiceImpl.initCluster(DeployServiceImpl.java:205)
	at io.choerodon.devops.api.eventhandler.AgentInitConfig$AgentInitListener.onConnected(AgentInitConfig.java:53)
	at io.choerodon.websocket.session.AgentSessionManager.onAgentCreated(AgentSessionManager.java:14)
	at io.choerodon.websocket.session.AgentSessionListener.onConnected(AgentSessionListener.java:34)
	at io.choerodon.websocket.websocket.SockHandlerDelegate.onSessionCreated(SockHandlerDelegate.java:23)
	at io.choerodon.websocket.websocket.SocketHandler.afterConnectionEstablished(SocketHandler.java:34)
	at org.springframework.web.socket.handler.WebSocketHandlerDecorator.afterConnectionEstablished(WebSocketHandlerDecorator.java:70)
	at org.springframework.web.socket.handler.LoggingWebSocketHandlerDecorator.afterConnectionEstablished(LoggingWebSocketHandlerDecorator.java:48)
	at org.springframework.web.socket.handler.ExceptionWebSocketHandlerDecorator.afterConnectionEstablished(ExceptionWebSocketHandlerDecorator.java:48)
	at org.springframework.web.socket.adapter.standard.StandardWebSocketHandlerAdapter.onOpen(StandardWebSocketHandlerAdapter.java:103)
	at org.apache.tomcat.websocket.server.WsHttpUpgradeHandler.init(WsHttpUpgradeHandler.java:133)
	at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:852)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1498)
	at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:813)

2019-07-04 09:46:02.135 DEBUG 7 --- [l-1 housekeeper] com.zaxxer.hikari.pool.HikariPool        : HikariPool-1 - Pool stats (total=15, active=0, idle=15, waiting=0)
2019-07-04 09:46:10.114  INFO 7 --- [nio-8060-exec-7] i.c.websocket.websocket.SocketHandler    : receive node_sync msg of inter:inter,
2019-07-04 09:46:10.457  INFO 7 --- [nio-8060-exec-9] i.

看下devops-service 日志里面 有没有这个报错

already have a agent in this env

没有这个几个关键字, agent pod 我直接删掉重启也不行
agent 一启动就报错

[root@k8s-06 ~]# kubectl logs -f -n choerodon choerodon-cluster-agent-xszyy-5b6f58b5c8-tlpx6
I0704 10:09:50.506976       6 agent.go:123] KubeClient init success.
I0704 10:09:50.507549       6 agent.go:125] Starting connect to tiller...
I0704 10:09:50.508086       6 agent.go:127] Tiller connect success
I0704 10:09:50.508099       6 agent.go:287] check k8s role binding...
I0704 10:09:50.534585       6 agent.go:293]  k8s role binding succeed.
I0704 10:09:50.534743       6 client.go:79] Started agent
I0704 10:09:50.535200       6 agent.go:208] kubectl /usr/bin/kubectl
I0704 10:09:51.294857       6 sync.go:178] kubectl apply -f - , took 759.617372ms, err: <nil>, output: customresourcedefinition.apiextensions.k8s.io/c7nhelmreleases.choerodon.io unchanged
E0704 10:09:55.679796       6 client.go:90] dial error ws://devops.xxxx/agent/?version=0.17.0&clusterId=3&token=fd15446f-35xxxe-aa21-e5fd6e3b8111&key=cluster:3: websocket: bad handshake

访问下这个地址试一下, https://devops.xxx.xxx/v2/api-docs 中间是devops-service域名地址

从哪里访问?我其他集群 连接都是正常的

问题集群上访问

[root@k8s-06 ~]# curl  https://devops.fr-inc.cn/v2/api-docs -k
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.15.6</center>
</body>
</html>
[root@k8s-06 ~]# curl  https://devops.xxx.xx/v2/api-docs -k
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.15.6</center>
</body>
</html>
[root@k8s-06 ~]#

看下devops-service对应的endpoints绑的devops-service的pod的ip是不是对的


这个肯定是没问题的 不然我其他集群 也会连不上的 ,现在就是 新加了一个集群 刚开始部署agent是没问题的 后面 就连不上了

只是新集群的agent连不上 其他集群的没问题?

是的

你好,请问c7n所在集群ingress-controller上层是否还有其他nginx之类的负载均衡器将流量导入到了其他地方去了?

如果未配置https请执行下面命令

curl http://devops.xxx.xxx/v2/api-docs 中间是devops-service域名地址

故障集群 运行
[root@k8s-06 ~]# curl http://devops.xx.xx/v2/api-docs

body{background-color:#FFFFFF} TestPage184

[root@k8s-06 ~]# curl http://devops.xxx.xx/v2/api-docs -I
HTTP/1.1 403 Forbidden
Server: Beaver
Cache-Control: no-cache
Content-Type: text/html
Content-Length: 597
Connection: close

[root@k8s-06 ~]#

正常集群 访问:
[root@k8s-deploy01 ~]# curl http://devops…xx/v2/api-docs -I
HTTP/1.1 200
Server: nginx/1.15.6
Date: Thu, 04 Jul 2019 03:34:47 GMT
Content-Type: application/json;charset=UTF-8
Connection: keep-alive
Vary: Accept-Encoding

[root@k8s-deploy01 ~]#


还有个情况 就是  新集群 连接c7n 做过NAT 转换 ,是不是有些 参数给弄丢了 怎么会是403.。。

请求这个地址是不需要其他参数的,建议排查一下请求是否被路由到其他地方去了,ping或者traceroute试一试

域名解析是没错的 路由也应该是没问题的 ,agent刚开始的 是 可以添加成功的 , 第二天就 集群失去连接了

要不在devops service中添加externalIPs字段,然后新集群去连这个externalIP试一试,看看情况如何?

问题找到了 是 备案问题 c7n 用的域名是 内部用的 没备案 之前的k8s 集群都是内部集群所有没问题,但新加的k8s集群是 外部的 域名没备案 80端口被掐了

谢谢反馈,我们这边也记录一下这种情况 :smiley: