AWS -EKS搭建 milvus 向量数据库
部署参考文档
AWS官方搭建流程
前置基础部署好 helm3 eksctl kubectl 、根据文档创建好 kafka 环境 等,下面开始 使用 eksctl 部署
一 创建 eks 集群
使用eksctl 创建一个集群,这里我使用仅内网访问,eks_cluster.yaml文件如下:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: pro-milvus-aidb
region: ap-northeast-1
version: "1.29"
iam:
withOIDC: true
serviceAccounts:
- metadata:
name: pro-milvus-awslb-controller
namespace: kube-system
wellKnownPolicies:
awsLoadBalancerController: true
- metadata:
name: promilvus-s3-access-sa
# if no namespace is set, "default" will be used;
# the namespace will be created if it doesn't exist already
namespace: milvus
labels: {aws-usage: "milvus"}
attachPolicyARNs:
- "arn:aws:iam::aws:policy/AmazonS3FullAccess"
# Use existed VPC to create EKS.
# If you don't config vpc subnets, eksctl will automatically create a brand new VPC
vpc:
id: vpc-8aedf1e8
cidr: "172.31.0.0/16"
subnets:
private:
ap-northeast-1a: { id: subnet-0c094f26c4a2db4c9 }
ap-northeast-1c: { id: subnet-0333a623447d4ad4c }
ap-northeast-1d: { id: subnet-07c4d7fe754476640 }
controlPlaneSubnetIDs: [ subnet-0c094f26c4a2db4c9,subnet-0333a623447d4ad4c,subnet-07c4d7fe754476640 ]
controlPlaneSecurityGroupIDs: [sg-0f2daed6a9609c4b7]
clusterEndpoints:
privateAccess: true
publicAccess: false
#publicAccessCIDRs: ["0.0.0.0/0"]
managedNodeGroups:
- name: ng-1-milvus
labels: { role: milvus }
instanceType: m6i.2xlarge
desiredCapacity: 3
privateNetworking: true
addons:
- name: vpc-cni # no version is specified so it deploys the default version
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- name: coredns
version: latest # auto discovers the latest available
- name: kube-proxy
version: latest
- name: aws-ebs-csi-driver
wellKnownPolicies: # add IAM and service account
ebsCSIController: true
执行部署:
eksctl create cluster -f eks_cluster.yaml
##可选执行,更新kubeconfig文件
aws eks update-kubeconfig --region <region-code> --name <cluster-name>
#集群创建完毕之后,运行如下命令就可以查看您的集群节点
kubectl get nodes -A -o wide
## 创建 ebs-sc StorageClass,配置 GP3 作为存储类型,并设置为 default StorageClass。Milvus 使用 etcd 作为 Meta Storage,需要依赖该 StorageClass 创建和管理 PVC
cat <<EOF | kubectl apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ebs-sc
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
type: gp3
EOF
#并将原来的 gp2 StorageClass 设置为非默认:
kubectl patch storageclass gp2 -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
#添加 eks-charts 仓库并更新
helm repo add eks https://aws.github.io/eks-charts
helm repo update
#安装 AWS Load Balancer Controller。请将 cluster-name 替换为您的集群名称。此处名为 aws-load-balancer-controller 的 ServiceAccount 已经在创建 EKS 集群时创建。
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
-n kube-system \
--set clusterName=<cluster-name> \
--set serviceAccount.create=false \
--set serviceAccount.name=aws-load-balancer-controller
#检查 Controller 是否安装成功。
kubectl get deployment -n kube-system aws-load-balancer-controller
#输出如下成功!
NAME READY UP-TO-DATE AVAILABLE AGE
aws-load-balancer-controller 2/2 2 2 12m
二 部署milvus数据库
- 首先,添加 Milvus Helm 仓库并更新。
helm repo add milvus https://milvus-io.github.io/milvus-helm/ helm repo update
- 配置S3 及 kafka 文件milvus_cluster.yaml,并开始创建milvus ,你可以修改自己的 release(demo)
```yaml#
Service account
- this service account are used by External S3 access
#
serviceAccount:
create: false
name: promilvus-s3-access-sa
#
Close in-cluster minio
#
minio:
enabled: false
#
External S3
- these configs are only used when externalS3.enabled
is true
#
externalS3:
enabled: true
host: “s3.ap-northeast-1.amazonaws.com”
port: “443”
useSSL: true
bucketName: “
rootPath: “pro-milvus-db/“ #
useIAM: true
cloudProvider: “aws”
iamEndpoint: “”
kafka config
#
Close in-cluster pulsar
#
pulsar:
enabled: false
#
External kafka
- these configs are only used when externalKafka.enabled
is true
#
externalKafka:
enabled: true
brokerList: “
securityProtocol: SASL_SSL
sasl:
mechanisms: SCRAM-SHA-512
username: “kafka-msk-pro”
password: “yourpassword”
执行创建
```bash
##demo 你可以自己改名
helm install demo milvus/milvus -n milvus -f milvus_cluster.yaml
#运行如下命令检查 pods 的状态。
kubectl get pods -n milvus
#running 状态表明创建成功。
NAME READY STATUS RESTARTS AGE
demo-etcd-0 1/1 Running 0 114s
demo-etcd-1 1/1 Running 0 114s
demo-etcd-2 1/1 Running 0 114s
demo-milvus-datacoord-548bf76868-b6vzb 1/1 Running 0 115s
demo-milvus-datanode-5fc794dd8b-z8l2x 1/1 Running 0 115s
demo-milvus-indexcoord-c9455db7d-sx22q 1/1 Running 0 115s
demo-milvus-indexnode-58bd66bbb7-f5xbp 1/1 Running 0 114s
demo-milvus-proxy-664c68c7b4-x6jqn 1/1 Running 0 114s
demo-milvus-querycoord-679bcf7497-7xg4v 1/1 Running 0 115s
demo-milvus-querynode-64f94b6f97-wl5v4 1/1 Running 0 114s
demo-milvus-rootcoord-5f9b687b57-d22s6 1/1 Running 0 115s
#获取 Milvus 访问终端节点。
kubectl get svc -n milvus
#输出示例如下,demo-milvus 就是 Milvus 的服务终端节点,其中 19530 为数据库访问端口,9091 为 Metrics 访问端口。默认的 Service 类型为 ClusterIP,这种类型只能在 EKS 集群内部访问。
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
demo-etcd ClusterIP 172.20.103.138 <none> 2379/TCP,2380/TCP 6m46s
demo-etcd-headless ClusterIP None <none> 2379/TCP,2380/TCP 6m46s
demo-milvus ClusterIP 172.20.219.33 <none> 19530/TCP,9091/TCP 6m46s
demo-milvus-datacoord ClusterIP 172.20.214.106 <none> 13333/TCP,9091/TCP 6m46s
demo-milvus-datanode ClusterIP None <none> 9091/TCP 6m46s
demo-milvus-indexcoord ClusterIP 172.20.106.51 <none> 31000/TCP,9091/TCP 6m46s
demo-milvus-indexnode ClusterIP None <none> 9091/TCP 6m46s
demo-milvus-querycoord ClusterIP 172.20.136.213 <none> 19531/TCP,9091/TCP 6m46s
demo-milvus-querynode ClusterIP None <none> 9091/TCP 6m46s
demo-milvus-rootcoord ClusterIP 172.20.173.98 <none> 53100/TCP,9091/TCP 6m46s
- 配置 Milvus 服务可供 EKS 集群外访问
Helm 支持在创建之后使用 helm upgrade 命令进行配置更新,我们采用这种方式对 Milvus 进行配置。 使用如下代码创建 milvus_service.yaml 配置文件
然后使用 Helm 更新配置文件。service: type: LoadBalancer port: 19530 annotations: service.beta.kubernetes.io/aws-load-balancer-type: nlb service.beta.kubernetes.io/aws-load-balancer-name : milvus-service service.beta.kubernetes.io/aws-load-balancer-scheme: internal service.beta.kubernetes.io/aws-load-balancer-internal: "true" ##配置为内网 service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
```bash
helm upgrade demo milvus/milvus -n milvus —reuse-values -f milvus_service.yaml运行如下命令查看 loadBalancer 是否创建:
kubectl get svc -n milvus
输出如下
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
demo-etcd ClusterIP 172.20.103.138
demo-etcd-headless ClusterIP None
demo-milvus LoadBalancer 172.20.219.33 milvus-nlb-xxxx.elb.us-west-2.amazonaws.com 19530:31201/TCP,9091:31088/TCP 62m
demo-milvus-datacoord ClusterIP 172.20.214.106
demo-milvus-datanode ClusterIP None
demo-milvus-indexcoord ClusterIP 172.20.106.51
demo-milvus-indexnode ClusterIP None
demo-milvus-querycoord ClusterIP 172.20.136.213
demo-milvus-querynode ClusterIP None
demo-milvus-rootcoord ClusterIP 172.20.173.98
4. 安装可视化管理工具 Attu
milvus_attu.yaml 文件如下
```yaml
attu:
enabled: true
name: attu
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: alb # Annotation: set ALB ingress type
alb.ingress.kubernetes.io/scheme: internet-facing #Places the load balancer on public subnets
alb.ingress.kubernetes.io/target-type: ip #The Pod IPs should be used as the target IPs (rather than the node IPs)
alb.ingress.kubernetes.io/group.name: attu # Groups multiple Ingress resources
hosts:
-
然后使用 Helm 更新配置文件。
helm upgrade demo milvus/milvus -n milvus --reuse-values -f milvus_attu.yaml
#再次运行如下命令查看:
kubectl get ingress -n milvus
##输出如下就能看到 ALB了,如果访问不了记得更新自己的内网类型是否公网,因为你开通的是 外网访问 attu,需要手动修改 ALB内网映射
NAME CLASS HOSTS ADDRESS PORTS AGE
demo-milvus-attu <none> * k8s-attu-xxxx.us-west-2.elb.amazonaws.com 80 27s
5。 优化一下 CPU及副本数配置 milvus_ha.yaml
##如下配置可以开启这些组件的多副本部署。需要注意的是 Root、Query、Data、Index
#等协调组件的多副本部署需要将 activeStandby 选项打开。多副本模式提高集群可用性
rootCoordinator:
replicas: 2
activeStandby:
enabled: true # Enable active-standby when you set multiple replicas for root coordinator
resources:
limits:
cpu: 1
memory: 2Gi
indexCoordinator:
replicas: 2
activeStandby:
enabled: true # Enable active-standby when you set multiple replicas for index coordinator
resources:
limits:
cpu: "0.5"
memory: 0.5Gi
queryCoordinator:
replicas: 2
activeStandby:
enabled: true # Enable active-standby when you set multiple replicas for query coordinator
resources:
limits:
cpu: "0.5"
memory: 0.5Gi
dataCoordinator:
replicas: 2
activeStandby:
enabled: true # Enable active-standby when you set multiple replicas for data coordinator
resources:
limits:
cpu: "0.5"
memory: 0.5Gi
proxy:
replicas: 2
resources:
limits:
cpu: 1
memory: 4Gi
queryNode:
replicas: 1
resources:
limits:
cpu: 1
memory: 4Gi
dataNode:
replicas: 1
resources:
limits:
cpu: 1
memory: 4Gi
indexNode:
replicas: 1
resources:
limits:
cpu: 4
memory: 8Gi
执行更新
helm upgrade demo milvus/milvus -n milvus --reuse-values -f milvus_ha.yaml
三 部署集成 prometheus 到集群中监控
我是下载git 项目中对应的代码,执行如下命令
git clone https://github.com/prometheus-operator/kube-prometheus.git
cd kube-prometheus
kubectl apply --server-side -f manifests/setup
kubectl wait \
--for condition=Established \
--all CustomResourceDefinition \
--namespace=monitoring
kubectl apply -f manifests/
##
helm upgrade my-release milvus/milvus --set metrics.serviceMonitor.enabled=true --reuse-values
##上面的执行是监控不到milvus空间中的内容的,需要执行如下才可以
kubectl patch clusterrole prometheus-k8s --type=json -p='[{"op": "add", "path": "/rules/-", "value": {"apiGroups": [""], "resources": ["pods", "services", "endpoints"], "verbs": ["get", "watch", "list"]}}]'