kubeadm-ansible安装失败

seinlaysun · 2019 年5 月 24 日 09:24

Choerodon平台版本: 0.16.0
遇到问题的执行步骤:

export ANSIBLE_HOST_KEY_CHECKING=False
#在kubeadm-ansible/目录下执行
ansible-playbook -i inventory/hosts -e @inventory/vars cluster.yml -K

文档地址:
http://choerodon.io/zh/docs/installation-configuration/steps/kubernetes/
环境信息(如:节点信息):
报错日志:
RUNNING HANDLER [etcd : wait for etcd up] ************************************************************************************************************************
Friday 24 May 2019 16:42:43 +0800 (0:00:00.276) 0:01:16.327 ************
FAILED - RETRYING: wait for etcd up (10 retries left).
FAILED - RETRYING: wait for etcd up (9 retries left).
FAILED - RETRYING: wait for etcd up (8 retries left).
FAILED - RETRYING: wait for etcd up (7 retries left).
FAILED - RETRYING: wait for etcd up (6 retries left).
FAILED - RETRYING: wait for etcd up (5 retries left).
FAILED - RETRYING: wait for etcd up (4 retries left).
FAILED - RETRYING: wait for etcd up (3 retries left).
FAILED - RETRYING: wait for etcd up (2 retries left).
FAILED - RETRYING: wait for etcd up (1 retries left).
fatal: [master]: FAILED! => {“attempts”: 10, “changed”: false, “content”: “”, “msg”: “Status code was -1 and not [200]: Request failed: <urlopen error [Errno 111] Connection refused>”, “redirected”: false, “status”: -1, “url”: “https://119.80.161.93:2379/health”}

如果不reset，再执行

ansible-playbook -i inventory/hosts -e @inventory/vars cluster.yml -K

会报另一个错
TASK [master : kubeadm | Initialize first master] ****************************************************************************************************************
Friday 24 May 2019 16:48:43 +0800 (0:00:01.237) 0:01:21.119 ************

卡在这里很久。大概卡半小时，之后报以下错。
fatal: [master]: FAILED! => {“changed”: true, “cmd”: [“kubeadm”, “init”, “–config=/etc/kubernetes/kubeadm-config.yaml”, “–ignore-preflight-errors=all”], “delta”: “0:30:21.078473”, “end”: “2019-05-24 17:19:05.039635”, “failed_when_result”: true, “msg”: “non-zero return code”, “rc”: 1, “start”: “2019-05-24 16:48:43.961162”, “stderr”: “\t[WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 18.03.1-ce. Max validated version: 17.03\nFlag --insecure-bind-address has been deprecated, This flag will be removed in a future version.\nFlag --insecure-port has been deprecated, This flag will be removed in a future version.\nFlag --admission-control has been deprecated, Use --enable-admission-plugins or --disable-admission-plugins instead. Will be removed in a future version.\n\t[WARNING ExternalEtcdVersion]: Get https://119.80.161.93:2379/version: dial tcp 119.80.161.93:2379: getsockopt: connection refused\ncouldn’t initialize a Kubernetes cluster”, “stderr_lines”: ["\t[WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 18.03.1-ce. Max validated version: 17.03", “Flag --insecure-bind-address has been deprecated, This flag will be removed in a future version.”, “Flag --insecure-port has been deprecated, This flag will be removed in a future version.”, “Flag --admission-control has been deprecated, Use --enable-admission-plugins or --disable-admission-plugins instead. Will be removed in a future version.”, “\t[WARNING ExternalEtcdVersion]: Get https://119.80.161.93:2379/version: dial tcp 119.80.161.93:2379: getsockopt: connection refused”, “couldn’t initialize a Kubernetes cluster”], “stdout”: “[init] Using Kubernetes version: v1.10.12\n[init] Using Authorization modes: [Node RBAC]\n[preflight] Running pre-flight checks.\n[certificates] Using the existing ca certificate and key.\n[certificates] Using the existing apiserver certificate and key.\n[certificates] Using the existing apiserver-kubelet-client certificate and key.\n[certificates] Using the existing sa key.\n[certificates] Using the existing front-proxy-ca certificate and key.\n[certificates] Using the existing front-proxy-client certificate and key.\n[certificates] Valid certificates and keys now exist in “/etc/kubernetes/pki”\n[kubeconfig] Wrote KubeConfig file to disk: “/etc/kubernetes/admin.conf”\n[kubeconfig] Wrote KubeConfig file to disk: “/etc/kubernetes/kubelet.conf”\n[kubeconfig] Wrote KubeConfig file to disk: “/etc/kubernetes/controller-manager.conf”\n[kubeconfig] Wrote KubeConfig file to disk: “/etc/kubernetes/scheduler.conf”\n[controlplane] Wrote Static Pod manifest for component kube-apiserver to “/etc/kubernetes/manifests/kube-apiserver.yaml”\n[controlplane] Wrote Static Pod manifest for component kube-controller-manager to “/etc/kubernetes/manifests/kube-controller-manager.yaml”\n[controlplane] Wrote Static Pod manifest for component kube-scheduler to “/etc/kubernetes/manifests/kube-scheduler.yaml”\n[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory “/etc/kubernetes/manifests”.\n[init] This might take a minute or longer if the control plane images have to be pulled.\n\nUnfortunately, an error has occurred:\n\ttimed out waiting for the condition\n\nThis error is likely caused by:\n\t- The kubelet is not running\n\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)\n\t- Either there is no internet connection, or imagePullPolicy is set to “Never”,\n\t so the kubelet cannot pull or find the following control plane images:\n\t\t- registry.cn-hangzhou.aliyuncs.com/choerodon-tools/kube-apiserver-amd64:v1.10.12\n\t\t- registry.cn-hangzhou.aliyuncs.com/choerodon-tools/kube-controller-manager-amd64:v1.10.12\n\t\t- registry.cn-hangzhou.aliyuncs.com/choerodon-tools/kube-scheduler-amd64:v1.10.12\n\t\t- registry.cn-hangzhou.aliyuncs.com/choerodon-tools/etcd-amd64:3.1.12 (only if no external etcd endpoints are configured)\n\nIf you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:\n\t- ‘systemctl status kubelet’\n\t- ‘journalctl -xeu kubelet’”, “stdout_lines”: ["[init] Using Kubernetes version: v1.10.12", “[init] Using Authorization modes: [Node RBAC]”, “[preflight] Running pre-flight checks.”, “[certificates] Using the existing ca certificate and key.”, “[certificates] Using the existing apiserver certificate and key.”, “[certificates] Using the existing apiserver-kubelet-client certificate and key.”, “[certificates] Using the existing sa key.”, “[certificates] Using the existing front-proxy-ca certificate and key.”, “[certificates] Using the existing front-proxy-client certificate and key.”, “[certificates] Valid certificates and keys now exist in “/etc/kubernetes/pki””, “[kubeconfig] Wrote KubeConfig file to disk: “/etc/kubernetes/admin.conf””, “[kubeconfig] Wrote KubeConfig file to disk: “/etc/kubernetes/kubelet.conf””, “[kubeconfig] Wrote KubeConfig file to disk: “/etc/kubernetes/controller-manager.conf””, “[kubeconfig] Wrote KubeConfig file to disk: “/etc/kubernetes/scheduler.conf””, “[controlplane] Wrote Static Pod manifest for component kube-apiserver to “/etc/kubernetes/manifests/kube-apiserver.yaml””, “[controlplane] Wrote Static Pod manifest for component kube-controller-manager to “/etc/kubernetes/manifests/kube-controller-manager.yaml””, “[controlplane] Wrote Static Pod manifest for component kube-scheduler to “/etc/kubernetes/manifests/kube-scheduler.yaml””, “[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory “/etc/kubernetes/manifests”.”, “[init] This might take a minute or longer if the control plane images have to be pulled.”, “”, “Unfortunately, an error has occurred:”, “\ttimed out waiting for the condition”, “”, “This error is likely caused by:”, “\t- The kubelet is not running”, “\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)”, “\t- Either there is no internet connection, or imagePullPolicy is set to “Never”,”, “\t so the kubelet cannot pull or find the following control plane images:”, “\t\t- registry.cn-hangzhou.aliyuncs.com/choerodon-tools/kube-apiserver-amd64:v1.10.12”, “\t\t- registry.cn-hangzhou.aliyuncs.com/choerodon-tools/kube-controller-manager-amd64:v1.10.12”, “\t\t- registry.cn-hangzhou.aliyuncs.com/choerodon-tools/kube-scheduler-amd64:v1.10.12”, “\t\t- registry.cn-hangzhou.aliyuncs.com/choerodon-tools/etcd-amd64:3.1.12 (only if no external etcd endpoints are configured)”, “”, “If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:”, “\t- ‘systemctl status kubelet’”, “\t- ‘journalctl -xeu kubelet’”]}

这次再使用命令：ansible-playbook -i inventory/hosts -e @inventory/vars cluster.yml
不进行 reset.yml
会有新的错误：
TASK [master : kubeadm | delete old kube-dns service] ************************************************************************************************************
Friday 24 May 2019 17:23:13 +0800 (0:00:00.786) 0:00:48.212 ************
fatal: [master]: FAILED! => {“changed”: true, “cmd”: [“kubectl”, “delete”, “svc”, “kube-dns”, “-n”, “kube-system”], “delta”: “0:00:00.087298”, “end”: “2019-05-24 17:23:13.832321”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2019-05-24 17:23:13.745023”, “stderr”: “error: the server doesn’t have a resource type “svc””, “stderr_lines”: [“error: the server doesn’t have a resource type “svc””], “stdout”: “”, “stdout_lines”: []}
…ignoring

TASK [master : kubeadm | create kube-dns service] ****************************************************************************************************************
Friday 24 May 2019 17:23:13 +0800 (0:00:00.349) 0:00:48.561 ************
fatal: [master]: FAILED! => {“changed”: true, “cmd”: [“kubectl”, “apply”, “-f”, “/etc/kubernetes/kube-dns-service.yml”], “delta”: “0:00:00.070228”, “end”: “2019-05-24 17:23:14.171394”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2019-05-24 17:23:14.101166”, “stderr”: “The connection to the server localhost:8080 was refused - did you specify the right host or port?”, “stderr_lines”: [“The connection to the server localhost:8080 was refused - did you specify the right host or port?”], “stdout”: “”, “stdout_lines”: []}
…ignoring

TASK [master : Update kube-proxy command args] *******************************************************************************************************************
Friday 24 May 2019 17:23:14 +0800 (0:00:00.343) 0:00:48.904 ************
fatal: [master]: FAILED! => {“changed”: true, “cmd”: “kubectl -n kube-system get ds -l ‘k8s-app=kube-proxy’ -o json | jq '.items[0].spec.template.spec.containers[0].command += [”–masquerade-all"]’ | kubectl apply -f - && kubectl delete pods -n kube-system -l ‘k8s-app=kube-proxy’", “delta”: “0:00:00.079949”, “end”: “2019-05-24 17:23:14.516168”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2019-05-24 17:23:14.436219”, “stderr”: “The connection to the server localhost:8080 was refused - did you specify the right host or port?\nerror: the server doesn’t have a resource type “ds””, “stderr_lines”: [“The connection to the server localhost:8080 was refused - did you specify the right host or port?”, “error: the server doesn’t have a resource type “ds””], “stdout”: “”, “stdout_lines”: []}

原因分析:

提出您分析问题的过程,以便我们能更准确的找到问题所在
疑问:

提出您对于遇到和解决该问题时的疑问

TimeBye · 2019 年5 月 24 日 10:18

你好，请检查你所使用的服务器是否正常，刚刚使用阿里云及本地虚拟机根据文档进行k8s安装均未发现有任何问题，出现 ignoring 级别的报错信息忽略即可。

seinlaysun · 2019 年5 月 25 日 11:24

谢谢。那个解决了。现在是
helm init
–history-max=3
–tiller-image=gcr.azk8s.cn/kubernetes-helm/tiller:v2.13.1
–stable-repo-url=https://mirror.azure.cn/kubernetes/charts/
–service-account=helm-tiller

helm install c7n/nfs-client-provisioner
–set rbac.create=true
–set persistence.enabled=true
–set storageClass.name=nfs-provisioner
–set persistence.nfsServer=119.80.161.93
–set persistence.nfsPath=/u01/prod
–version 0.1.0
–name nfs-client-provisioner
–namespace kube-system

之后报错。

Error: release nfs-client-provisioner failed: namespaces “kube-system” is forbidden: User “system:serviceaccount:kube-system:default” cannot get namespaces in the namespace “kube-system”

TimeBye · 2019 年5 月 25 日 11:49

你好，应该是在安装helm时没有绑定上serveraccount。安装命令每行后面的 \ 请勿删除；按照choerodon官网搭建的k8s集群默认是开启RBAC权限认证的，请不要删除 --service-account 标记。

解决办法

执行以下命令：

helm init --upgrade \
    --history-max=3 \
    --tiller-image=gcr.azk8s.cn/kubernetes-helm/tiller:v2.13.1 \
    --stable-repo-url=https://mirror.azure.cn/kubernetes/charts/ \
    --service-account=helm-tiller

PS: 不同的问题建议新开一个帖子进行提问哈

seinlaysun · 2019 年5 月 25 日 12:32

好的。谢谢