您的当前位置:首页>全部文章>文章详情

AWS -EKS搭建 milvus 向量数据库

gathin发表于:2023-08-24 17:57:58浏览:254次TAG: #milvus

部署参考文档

AWS官方搭建流程
前置基础部署好 helm3 eksctl kubectl 、根据文档创建好 kafka 环境 等,下面开始 使用 eksctl 部署

一 创建 eks 集群

使用eksctl 创建一个集群,这里我使用仅内网访问,eks_cluster.yaml文件如下:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: pro-milvus-aidb
  region: ap-northeast-1
  version: "1.29"

iam:
  withOIDC: true

  serviceAccounts:
    - metadata:
        name: pro-milvus-awslb-controller
        namespace: kube-system
      wellKnownPolicies:
        awsLoadBalancerController: true
    - metadata:
        name: promilvus-s3-access-sa
        # if no namespace is set, "default" will be used;
        # the namespace will be created if it doesn't exist already
        namespace: milvus
        labels: {aws-usage: "milvus"}
      attachPolicyARNs:
        - "arn:aws:iam::aws:policy/AmazonS3FullAccess"

# Use existed VPC to create EKS.
# If you don't config vpc subnets, eksctl will automatically create a brand new VPC
vpc:
  id: vpc-8aedf1e8
  cidr: "172.31.0.0/16"
  subnets:
    private:
      ap-northeast-1a: { id: subnet-0c094f26c4a2db4c9 }
      ap-northeast-1c: { id: subnet-0333a623447d4ad4c }
      ap-northeast-1d: { id: subnet-07c4d7fe754476640 }
  controlPlaneSubnetIDs: [ subnet-0c094f26c4a2db4c9,subnet-0333a623447d4ad4c,subnet-07c4d7fe754476640 ]
  controlPlaneSecurityGroupIDs: [sg-0f2daed6a9609c4b7]
  clusterEndpoints:
   privateAccess: true
   publicAccess: false
     #publicAccessCIDRs: ["0.0.0.0/0"]

managedNodeGroups:
  - name: ng-1-milvus
    labels: { role: milvus }
    instanceType: m6i.2xlarge
    desiredCapacity: 3
    privateNetworking: true

addons:
  - name: vpc-cni # no version is specified so it deploys the default version
    attachPolicyARNs:
      - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
  - name: coredns
    version: latest # auto discovers the latest available
  - name: kube-proxy
    version: latest
  - name: aws-ebs-csi-driver
    wellKnownPolicies:      # add IAM and service account
      ebsCSIController: true

执行部署:

eksctl create cluster -f eks_cluster.yaml

##可选执行,更新kubeconfig文件
aws eks update-kubeconfig --region <region-code> --name <cluster-name>
#集群创建完毕之后,运行如下命令就可以查看您的集群节点
kubectl get nodes -A -o wide

## 创建 ebs-sc StorageClass,配置 GP3 作为存储类型,并设置为 default StorageClass。Milvus 使用 etcd 作为 Meta Storage,需要依赖该 StorageClass 创建和管理 PVC
cat <<EOF | kubectl apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-sc
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
  type: gp3
EOF

#并将原来的 gp2 StorageClass 设置为非默认:
kubectl patch storageclass gp2 -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'

#添加 eks-charts 仓库并更新
helm repo add eks https://aws.github.io/eks-charts
helm repo update

#安装 AWS Load Balancer Controller。请将 cluster-name 替换为您的集群名称。此处名为 aws-load-balancer-controller 的 ServiceAccount 已经在创建 EKS 集群时创建。
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
  -n kube-system \
  --set clusterName=<cluster-name> \
  --set serviceAccount.create=false \
  --set serviceAccount.name=aws-load-balancer-controller

#检查 Controller 是否安装成功。
kubectl get deployment -n kube-system aws-load-balancer-controller

#输出如下成功!
NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
aws-load-balancer-controller   2/2     2            2           12m

二 部署milvus数据库

  1. 首先,添加 Milvus Helm 仓库并更新。
    helm repo add milvus https://milvus-io.github.io/milvus-helm/
    helm repo update
    
  2. 配置S3 及 kafka 文件milvus_cluster.yaml,并开始创建milvus ,你可以修改自己的 release(demo)
    ```yaml
    #

    Service account

    - this service account are used by External S3 access

    #
    serviceAccount:
    create: false
    name: promilvus-s3-access-sa
#

Close in-cluster minio

#

minio:
enabled: false

#

External S3

- these configs are only used when externalS3.enabled is true

#

externalS3:
enabled: true
host: “s3.ap-northeast-1.amazonaws.com”
port: “443”
useSSL: true
bucketName: ““ #
rootPath: “pro-milvus-db/“ #
useIAM: true
cloudProvider: “aws”
iamEndpoint: “”

kafka config

#

Close in-cluster pulsar

#

pulsar:
enabled: false

#

External kafka

- these configs are only used when externalKafka.enabled is true

#

externalKafka:
enabled: true
brokerList: “
securityProtocol: SASL_SSL
sasl:
mechanisms: SCRAM-SHA-512
username: “kafka-msk-pro”
password: “yourpassword”

执行创建
```bash

##demo 你可以自己改名
helm install demo milvus/milvus -n milvus -f milvus_cluster.yaml
#运行如下命令检查 pods 的状态。
kubectl get pods -n milvus

#running 状态表明创建成功。
NAME                                      READY   STATUS    RESTARTS   AGE
demo-etcd-0                               1/1     Running   0          114s
demo-etcd-1                               1/1     Running   0          114s
demo-etcd-2                               1/1     Running   0          114s
demo-milvus-datacoord-548bf76868-b6vzb    1/1     Running   0          115s
demo-milvus-datanode-5fc794dd8b-z8l2x     1/1     Running   0          115s
demo-milvus-indexcoord-c9455db7d-sx22q    1/1     Running   0          115s
demo-milvus-indexnode-58bd66bbb7-f5xbp    1/1     Running   0          114s
demo-milvus-proxy-664c68c7b4-x6jqn        1/1     Running   0          114s
demo-milvus-querycoord-679bcf7497-7xg4v   1/1     Running   0          115s
demo-milvus-querynode-64f94b6f97-wl5v4    1/1     Running   0          114s
demo-milvus-rootcoord-5f9b687b57-d22s6    1/1     Running   0          115s 

#获取 Milvus 访问终端节点。
kubectl get svc -n milvus
#输出示例如下,demo-milvus 就是 Milvus 的服务终端节点,其中 19530 为数据库访问端口,9091 为 Metrics 访问端口。默认的 Service 类型为 ClusterIP,这种类型只能在 EKS 集群内部访问。
NAME                     TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)              AGE
demo-etcd                ClusterIP   172.20.103.138   <none>        2379/TCP,2380/TCP    6m46s
demo-etcd-headless       ClusterIP   None             <none>        2379/TCP,2380/TCP    6m46s
demo-milvus              ClusterIP   172.20.219.33    <none>        19530/TCP,9091/TCP   6m46s
demo-milvus-datacoord    ClusterIP   172.20.214.106   <none>        13333/TCP,9091/TCP   6m46s
demo-milvus-datanode     ClusterIP   None             <none>        9091/TCP             6m46s
demo-milvus-indexcoord   ClusterIP   172.20.106.51    <none>        31000/TCP,9091/TCP   6m46s
demo-milvus-indexnode    ClusterIP   None             <none>        9091/TCP             6m46s
demo-milvus-querycoord   ClusterIP   172.20.136.213   <none>        19531/TCP,9091/TCP   6m46s
demo-milvus-querynode    ClusterIP   None             <none>        9091/TCP             6m46s
demo-milvus-rootcoord    ClusterIP   172.20.173.98    <none>        53100/TCP,9091/TCP   6m46s
  1. 配置 Milvus 服务可供 EKS 集群外访问
    Helm 支持在创建之后使用 helm upgrade 命令进行配置更新,我们采用这种方式对 Milvus 进行配置。 使用如下代码创建 milvus_service.yaml 配置文件
    service:
    type: LoadBalancer
    port: 19530
    annotations:
     service.beta.kubernetes.io/aws-load-balancer-type: nlb
     service.beta.kubernetes.io/aws-load-balancer-name : milvus-service
     service.beta.kubernetes.io/aws-load-balancer-scheme: internal
     service.beta.kubernetes.io/aws-load-balancer-internal: "true" ##配置为内网
     service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
    
    然后使用 Helm 更新配置文件。
    ```bash
    helm upgrade demo milvus/milvus -n milvus —reuse-values -f milvus_service.yaml

    运行如下命令查看 loadBalancer 是否创建:

    kubectl get svc -n milvus

输出如下

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
demo-etcd ClusterIP 172.20.103.138 2379/TCP,2380/TCP 62m
demo-etcd-headless ClusterIP None 2379/TCP,2380/TCP 62m
demo-milvus LoadBalancer 172.20.219.33 milvus-nlb-xxxx.elb.us-west-2.amazonaws.com 19530:31201/TCP,9091:31088/TCP 62m
demo-milvus-datacoord ClusterIP 172.20.214.106 13333/TCP,9091/TCP 62m
demo-milvus-datanode ClusterIP None 9091/TCP 62m
demo-milvus-indexcoord ClusterIP 172.20.106.51 31000/TCP,9091/TCP 62m
demo-milvus-indexnode ClusterIP None 9091/TCP 62m
demo-milvus-querycoord ClusterIP 172.20.136.213 19531/TCP,9091/TCP 62m
demo-milvus-querynode ClusterIP None 9091/TCP 62m
demo-milvus-rootcoord ClusterIP 172.20.173.98 53100/TCP,9091/TCP


4. 安装可视化管理工具 Attu
milvus_attu.yaml 文件如下
```yaml
attu:
  enabled: true
  name: attu
  ingress:
    enabled: true
    annotations:
      kubernetes.io/ingress.class: alb # Annotation: set ALB ingress type
      alb.ingress.kubernetes.io/scheme: internet-facing #Places the load balancer on public subnets
      alb.ingress.kubernetes.io/target-type: ip #The Pod IPs should be used as the target IPs (rather than the node IPs)
      alb.ingress.kubernetes.io/group.name: attu # Groups multiple Ingress resources
    hosts:
      -

然后使用 Helm 更新配置文件。

helm upgrade demo milvus/milvus -n milvus --reuse-values -f milvus_attu.yaml
#再次运行如下命令查看:
kubectl get ingress -n milvus

##输出如下就能看到 ALB了,如果访问不了记得更新自己的内网类型是否公网,因为你开通的是 外网访问 attu,需要手动修改 ALB内网映射
NAME               CLASS    HOSTS   ADDRESS                                     PORTS   AGE
demo-milvus-attu   <none>   *       k8s-attu-xxxx.us-west-2.elb.amazonaws.com   80      27s

5。 优化一下 CPU及副本数配置 milvus_ha.yaml

##如下配置可以开启这些组件的多副本部署。需要注意的是 Root、Query、Data、Index
#等协调组件的多副本部署需要将 activeStandby 选项打开。多副本模式提高集群可用性
rootCoordinator:
  replicas: 2
  activeStandby:
    enabled: true  # Enable active-standby when you set multiple replicas for root coordinator
  resources:
    limits:
      cpu: 1
      memory: 2Gi
indexCoordinator:
  replicas: 2
  activeStandby:
    enabled: true  # Enable active-standby when you set multiple replicas for index coordinator
  resources:
    limits:
      cpu: "0.5"
      memory: 0.5Gi
queryCoordinator:
  replicas: 2
  activeStandby:
    enabled: true  # Enable active-standby when you set multiple replicas for query coordinator
  resources:
    limits:
      cpu: "0.5"
      memory: 0.5Gi
dataCoordinator:
  replicas: 2
  activeStandby:
    enabled: true  # Enable active-standby when you set multiple replicas for data coordinator
  resources:
    limits:
      cpu: "0.5"
      memory: 0.5Gi
proxy:
  replicas: 2
  resources:
    limits:
      cpu: 1
      memory: 4Gi
queryNode:
  replicas: 1
  resources:
    limits:
      cpu: 1
      memory: 4Gi
dataNode:
  replicas: 1
  resources:
    limits:
      cpu: 1
      memory: 4Gi
indexNode:
  replicas: 1
  resources:
    limits:
      cpu: 4
      memory: 8Gi

执行更新

helm upgrade demo milvus/milvus -n milvus --reuse-values -f milvus_ha.yaml

三 部署集成 prometheus 到集群中监控

我是下载git 项目中对应的代码,执行如下命令

git clone https://github.com/prometheus-operator/kube-prometheus.git
cd kube-prometheus
kubectl apply --server-side -f manifests/setup
kubectl wait \
        --for condition=Established \
        --all CustomResourceDefinition \
        --namespace=monitoring
kubectl apply -f manifests/

##
helm upgrade my-release milvus/milvus --set metrics.serviceMonitor.enabled=true --reuse-values
##上面的执行是监控不到milvus空间中的内容的,需要执行如下才可以
kubectl patch clusterrole prometheus-k8s --type=json -p='[{"op": "add", "path": "/rules/-", "value": {"apiGroups": [""], "resources": ["pods", "services", "endpoints"], "verbs": ["get", "watch", "list"]}}]'