第一步部署k8s就报错了,记录部署出错以及处理过程吧

  • Choerodon平台版本: 0.18

  • 遇到问题的执行步骤:

Kubernetes集群部署:ansible-playbook -i inventory/hosts -e @inventory/vars cluster.yml -K

  • 文档地址:
    http://choerodon.io/zh/docs/installation-configuration/steps/kubernetes/
  • 环境信息(如:节点信息):
    ------------------hzkj_zcy_master1------------------------------
    ip 192.168.99.123
    centos7.5
    docker 18.09.0
    docker-compose 1.23.1
    12c/17000MB 600GB
    ------------------hzkj_zcy_node1------------------------------
    ip 192.168.55.25
    centos7.5
    docker 18.09.0
    docker-compose 1.23.1
    12c/17000MB 150GB
    ------------------hzkj_zcy_node2------------------------------
    ip 192.168.55.35
    centos7.5
    docker 18.09.0
    docker-compose 1.23.1
    12c/17000MB 150GB
    ------------------hzkj_zcy_node3------------------------------
    ip 192.168.55.145
    centos7.5
    docker 18.09.0
    docker-compose 1.23.1
    12c/17000MB 150GB
    ------------------hzkj_zcy_node4------------------------------
    ip 192.168.55.65
    centos7.5
    docker 18.09.0
    docker-compose 1.23.1
    12c/17000MB 150GB



所有节点:
0.docker自启
systemctl enable docker
systemctl restart docker
1.关闭防火墙(生产环境不关闭防火墙,开放相应端口即可)
systemctl stop firewalld
systemctl disable firewalld
2.关闭swap分区
sudo swapoff -a
#要永久禁掉swap分区,打开如下文件注释掉swap那一行 (需要注释)
sudo vi /etc/fstab
3.同步服务器时区
yum install ntp ntpdate -y
timedatectl status
timedatectl list-timezones | grep Shanghai
timedatectl set-timezone Asia/Hong_Kong
timedatectl set-ntp yes
date
4.关闭SELinux
setenforce 0
sed -i ‘s/^SELINUX=enforcing$/SELINUX=permissive/’ /etc/selinux/config
4.改host(可不改) & hostname
vi /etc/hosts
vi /etc/hostname
5.ip调整策略
#vi /etc/sysctl.conf
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
#sysctl -p
6.重启
reboot
或者
init 6

  • 报错日志:
    TASK [base/prepare : Download cfssl] *******************************************************************************************************************************************************************************************************
    Monday 05 August 2019 17:42:01 +0800 (0:00:00.563) 0:00:22.201 *********
    An exception occurred during task execution. To see the full traceback, use -vvv. The error was: SSLError: (‘The read operation timed out’,)
    fatal: [hzkj_zcy_node2]: FAILED! => {“changed”: false, “msg”: “failed to create temporary content file: (‘The read operation timed out’,)”}
    An exception occurred during task execution. To see the full traceback, use -vvv. The error was: SSLError: (‘The read operation timed out’,)
    fatal: [hzkj_zcy_node1]: FAILED! => {“changed”: false, “msg”: “failed to create temporary content file: (‘The read operation timed out’,)”}
    An exception occurred during task execution. To see the full traceback, use -vvv. The error was: SSLError: (‘The read operation timed out’,)
    fatal: [hzkj_zcy_master1]: FAILED! => {“changed”: false, “msg”: “failed to create temporary content file: (‘The read operation timed out’,)”}

NO MORE HOSTS LEFT *************************************************************************************************************************************************************************************************************************
to retry, use: --limit @/root/kubeadm-ansible/cluster.retry

PLAY RECAP *********************************************************************************************************************************************************************************************************************************
hzkj_zcy_master1 : ok=33 changed=11 unreachable=0 failed=1
hzkj_zcy_node1 : ok=32 changed=11 unreachable=0 failed=1
hzkj_zcy_node2 : ok=32 changed=11 unreachable=0 failed=1
hzkj_zcy_node3 : ok=31 changed=11 unreachable=0 failed=0
hzkj_zcy_node4 : ok=31 changed=11 unreachable=0 failed=0

Monday 05 August 2019 17:42:44 +0800 (0:00:43.167) 0:01:05.369 *********

base/prepare : Download cfssl ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 43.17s
base/prepare : iptables accept all traffic from other node -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2.00s
base/prepare : iptables output all traffic from other node -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1.86s
base/prepare : Create kubernetes directories ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1.37s
base/prepare : Assign inventory name to hostnames ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1.24s
base/prepare : Ensure sysctl config ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.96s
base/prepare : Persist br_netfilter module ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 0.94s
base/prepare : Hosts | populate inventory into hosts file --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.86s
base/prepare : Create cni directories ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.84s
base/prepare : set timezone to Asia/ShangHai ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.75s
base/prepare : Hosts | localhost ipv4 in hosts file --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.61s
base/prepare : sysctl set net.ipv4.ip_forward=1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.59s
base/prepare : Hosts | localhost ipv6 in hosts file --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.57s
base/prepare : Ensure Yum repository ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 0.56s
base/prepare : Check presence of fastestmirror.conf --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.56s
base/prepare : iptables forward all traffic from other node ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.55s
base/prepare : Verify if br_netfilter module exists --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.54s
base/prepare : iptables output all traffic from kube pod subnet --------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.53s

  • 原因分析:

    提出您分析问题的过程,以便我们能更准确的找到问题所在
    每个节点本身就初始安装了docker docker-compose,有没有是这方面的问题;再就是网络问题,访问官网过慢?

  • 疑问:

    提出您对于遇到和解决该问题时的疑问

1、cfssl步骤错误


2、yumutil步骤错误
https://shipengliang.com/software-exp/delta-rpms-disabled-because-usr-bin-applydeltarpm-not-installed-解决办法.html
3、docker步骤报错
卸载docker
卸载:
1、查询docker安装过的包:
yum list installed | grep docker
2、删除安装包:
yum remove docker-ce.x86_64 ddocker-ce-cli.x86_64 -y
3、删除镜像/容器等
rm -rf /var/lib/docker
4、sysctl set net 报错
注释脚本中的 net.netfilter.nf_conntrack_max=1000000这一行
重新执行 sysctl -p /etc/sysctl.d/95-k8s-sysctl.conf
5、etcd : Copy etcdctl binary from docker container 这步卡了好久(当前还在这步进行中。。。。。。。。)

5、etcd : Copy etcdctl binary from docker container 这步卡了好久
去对应节点手动下载镜像
docker pull registry.cn-hangzhou.aliyuncs.com/choerodon-tools/etcd:v3.3.6

卡在《master : kubeadm | Initialize first master》
。。。。提示hostname不符合dns命名规范。
哎,又要重来一遍。
重新用干净的centos来部署算了。

你好,确实需要用没有安装 docker 的服务器,执行集群安装时脚本会自动进行安装

http://choerodon.io/zh/docs/installation-configuration/steps/nfs/

  • 安装 nfs-client-provisioner
  • 验证安装
    验证安装

Name: write-pod
Namespace: default
Node:
Labels:
Annotations: kubectl.kubernetes.io/last-applied-configuration={“apiVersion”:“v1”,“kind”:“Pod”,“metadata”:{“annotations”:{},“name”:“write-pod”,“namespace”:“default”},“spec”:{“containers”:[{“args”:["-c","touch /mnt/…
Status: Pending
IP:
Containers:
write-pod:
Image: busybox
Port:
Host Port:
Command:
/bin/sh
Args:
-c
touch /mnt/SUCCESS && exit 0 || exit 1
Environment:
Mounts:
/mnt from nfs-pvc (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-gc6b5 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
nfs-pvc:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: myclaim
ReadOnly: false
default-token-gc6b5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-gc6b5
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations:
Events:
Type Reason Age From Message


Warning FailedScheduling 8s (x2 over 8s) default-scheduler persistentvolumeclaim “myclaim” not found
Warning FailedScheduling 1s (x4 over 7s) default-scheduler pod has unbound PersistentVolumeClaims (repeated 5 times)

这个错,是什么原因哦

下面遇到了nfs的问题

你看下下面的问题,是不是我配置的有问题

算了等一会,就好了,不过每次发布都会提示 image pull err
第二次重试就好了

2019/08/07 10:27:12 [INFO] job asgard-service-init-db haven’t finished yet. please wait patiently
2019/08/07 10:27:22 [INFO] job asgard-service-init-db haven’t finished yet. please wait patiently
2019/08/07 10:27:32 [INFO] job asgard-service-init-db haven’t finished yet. please wait patiently
2019/08/07 10:27:42 [INFO] job asgard-service-init-db haven’t finished yet. please wait patiently
2019/08/07 10:27:52 [INFO] job asgard-service-init-db haven’t finished yet. please wait patiently
2019/08/07 10:28:02 [INFO] job asgard-service-init-db haven’t finished yet. please wait patiently
2019/08/07 10:28:12 [INFO] job asgard-service-init-db haven’t finished yet. please wait patiently
2019/08/07 10:28:22 [INFO] job asgard-service-init-db haven’t finished yet. please wait patiently
2019/08/07 10:28:32 [INFO] job asgard-service-init-db haven’t finished yet. please wait patiently
2019/08/07 10:28:42 [INFO] job asgard-service-init-db haven’t finished yet. please wait patiently
2019/08/07 10:28:52 [INFO] job asgard-service-init-db haven’t finished yet. please wait patiently
2019/08/07 10:29:02 [INFO] job asgard-service-init-db haven’t finished yet. please wait patiently
2019/08/07 10:29:12 [INFO] job asgard-service-init-db haven’t finished yet. please wait patiently
2019/08/07 10:29:22 [INFO] job asgard-service-init-db haven’t finished yet. please wait patiently
2019/08/07 10:29:32 [INFO] job asgard-service-init-db haven’t finished yet. please wait patiently
2019/08/07 10:29:42 [INFO] job asgard-service-init-db haven’t finished yet. please wait patiently
2019/08/07 10:29:52 [INFO] job asgard-service-init-db haven’t finished yet. please wait patiently
2019/08/07 10:30:02 [INFO] job asgard-service-init-db haven’t finished yet. please wait patiently
2019/08/07 10:30:12 [Error] install failed
Error: rpc error: code = Unknown desc = watch closed before UntilWithoutRetry timeout
Usage:
c7nctl install [flags]

Flags:
-c, --config-file string User Config file to read from, User define config by this file
–debug enable debug output
-h, --help help for install
–no-timeout disable install job timeout
–prefix string add prefix to all helm release
-r, --resource-file string Resource file to read from, It provide which app should be installed
–skip-input use default username and password to avoid user input
–version string specify a version

Global Flags:
–config string config file (default is $HOME/.c7n.yaml)
-o, --orgCode string org code
-p, --proCode string pro code

rpc error: code = Unknown desc = watch closed before UntilWithoutRetry timeout

这个问题,是不是job服务安装失败了哦

你没有把这个内容复制完吗?

kind: Pod
apiVersion: v1
metadata:
  name: write-pod
spec:
  containers:
  - name: write-pod
    image: busybox
    command:
      - "/bin/sh"
    args:
      - "-c"
      - "touch /mnt/SUCCESS && exit 0 || exit 1"
    volumeMounts:
      - name: nfs-pvc
        mountPath: "/mnt"
  restartPolicy: "Never"
  volumes:
    - name: nfs-pvc
      persistentVolumeClaim:
        claimName: myclaim
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: myclaim
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: nfs-provisioner
  resources:
    requests:
      storage: 1Mi

请检查是否服务器网络问题

请检查数据库是否启动正常

kubectl get po -n c7n-system

[root@master1 ~]# kubectl get po -n c7n-system
NAME READY STATUS RESTARTS AGE
asgard-service-init-db-p725q 0/1 PodInitializing 0 13m
c7n-slaver-259xw 1/1 Running 0 15h
c7n-slaver-4bj8v 1/1 Running 0 15h
c7n-slaver-7z6jr 1/1 Running 0 15h
c7n-slaver-fqfxd 1/1 Running 0 15h
c7n-slaver-g69cp 1/1 Running 0 15h
chartmuseum-chartmuseum-85b9b8887f-ttqls 1/1 Running 0 1h
gitlab-547895fbd6-ftn48 1/1 Running 0 1h
harbor-harbor-adminserver-74dc5c7f48-w6j7d 0/1 CrashLoopBackOff 15 1h
harbor-harbor-clair-975cc57bf-phdjr 0/1 ImagePullBackOff 0 1h
harbor-harbor-core-6744f5d6c8-js7n7 0/1 ImagePullBackOff 0 1h
harbor-harbor-database-0 0/1 Init:ErrImagePull 0 1h
harbor-harbor-jobservice-7d7774bf48-8cmnl 0/1 CrashLoopBackOff 24 1h
harbor-harbor-portal-9c48d9887-5t8cd 1/1 Running 0 1h
harbor-harbor-redis-0 0/1 ImagePullBackOff 0 1h
harbor-harbor-registry-774d95fbd6-ng8q2 0/2 ErrImagePull 0 1h
minio-577cbb8f89-gb6jq 1/1 Running 0 1h
mysql-557cbfc5d7-9qxhr 1/1 Running 0 15h
postgresql-postgresql-0 1/1 Running 0 15h
redis-55567658d4-l7dpd 1/1 Running 0 15h
register-server-778866f56-5tzd6 1/1 Running 0 1h

还在拉镜像

按照教程终于部署完了。

平台运行好像有问题,harbor 管理员登陆不上去,db里面没有harbar_user这张表:worried:

这是现在集群运行状况

api.*.com 域名是这样子的,不知道是正常的还是不正常的。

notify.*.com 域名是这样子的

devops.*.com 一直转圈圈

F12里面提示
Failed to load http://gateway.choerodon.com.cn/iam/v1/system/setting: No ‘Access-Control-Allow-Origin’ header is present on the requested resource. Origin 'http://devops.*.com is therefore not allowed access. The response had HTTP status code 404.