Reina Wang
Reina Wang

Reputation: 67

Can't Load Collection after Milvus-Standalone Restart on Docker-Compose with AWS S3 Storage

I'm running Milvus-standalone v2.2.11 via Docker Compose on Windows 10 and have configured AWS S3 storage for it:

minio:
  address: s3.us-east-2.amazonaws.com
  port: 80
  accessKeyID: <>
  secretAccessKey: <>
  useSSL: false 
  bucketName: <>

After building and running it, I create a collection and load it successfully. However, when the Docker container is restarted, and I try to load the collection again, it hangs.

Here are the logs:

2023-07-14 14:41:44 [2023/07/14 11:41:44.295 +00:00] [INFO] [querynode/impl.go:342] ["watchDmChannels start "] [collectionID=442848867353100457] [nodeID=519] [channels="[by-dev-rootcoord-dml_4_442848867353100457v0]"] [timeInQueue=30.841µs]
2023-07-14 14:41:44 [2023/07/14 11:41:44.295 +00:00] [INFO] [querynode/watch_dm_channels_task.go:84] ["Starting WatchDmChannels ..."] [collectionID=442848867353100457] [vChannels="[by-dev-rootcoord-dml_4_442848867353100457v0]"] [replicaID=442848906787160065] [loadType=LoadCollection] [collectionName=clip_index] [metricType=IP]
2023-07-14 14:41:44 [2023/07/14 11:41:44.295 +00:00] [INFO] [querynode/shard_cluster_service.go:81] ["successfully add shard cluster"] [collectionID=442848867353100457] [replica=442848906787160065] [vchan=by-dev-rootcoord-dml_4_442848867353100457v0]
2023-07-14 14:41:44 [2023/07/14 11:41:44.295 +00:00] [INFO] [querynode/watch_dm_channels_task.go:243] ["loading growing segments in WatchDmChannels..."] [collectionID=442848867353100457] [unFlushedSegmentIDs="[442848867353300479]"]
2023-07-14 14:41:44 [2023/07/14 11:41:44.295 +00:00] [INFO] [querynode/segment_loader.go:125] ["segmentLoader start loading..."] [collectionID=442848867353100457] [segmentType=Growing] [segmentNum=1] [msgID=358]
2023-07-14 14:41:44 [2023/07/14 11:41:44.298 +00:00] [INFO] [querynode/segment_loader.go:890] ["predict memory and disk usage while loading (in MiB)"] [collectionID=442848867353100457] [concurrency=1] [memLoadingUsage=360] [memUsageAfterLoad=360] [diskUsageAfterLoad=0] [currentUsedMem=360] [currentAvailableFreeMemory=10885] [currentTotalMemory=11245]
2023-07-14 14:41:44 [2023/07/14 11:41:44.298 +00:00] [INFO] [querynode/segment.go:240] ["create segment"] [collectionID=442848867353100457] [partitionID=442848867353100458] [segmentID=442848867353300479] [segmentType=Growing] [vchannel=by-dev-rootcoord-dml_4_442848867353100457v0]
2023-07-14 14:41:44 [2023/07/14 11:41:44.298 +00:00] [INFO] [querynode/segment_loader.go:220] ["start to load segments in parallel"] [collectionID=442848867353100457] [segmentType=Growing] [segmentNum=1] [concurrencyLevel=1]
2023-07-14 14:41:44 [2023/07/14 11:41:44.298 +00:00] [INFO] [querynode/segment_loader.go:273] ["start loading segment data into memory"] [collectionID=442848867353100457] [partitionID=442848867353100458] [segmentID=442848867353300479] [segmentType=Growing]
2023-07-14 14:41:44 [2023/07/14 11:41:44.581 +00:00] [WARN] [querynode/cgo_helper.go:56] ["LoadFieldData failed, C Runtime Exception: [UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]\n"]
2023-07-14 14:41:44 [2023/07/14 11:41:44.585 +00:00] [INFO] [gc/gc_tuner.go:84] ["GC Tune done"] ["previous GOGC"=200] ["heapuse "=55] ["total memory"=360] ["next GC"=137] ["new GOGC"=200] [gc-pause=60.835µs] [gc-pause-end=1689334904583782139]
2023-07-14 14:41:44 [2023/07/14 11:41:44.586 +00:00] [ERROR] [querynode/segment_loader.go:205] ["load segment failed when load data into memory"] [collectionID=442848867353100457] [segmentType=Growing] [partitionID=442848867353100458] [segmentID=442848867353300479] [error="[UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]"] [stack="github.com/milvus-io/milvus/internal/querynode.(*segmentLoader).LoadSegment.func3\n\t/go/src/github.com/milvus-io/milvus/internal/querynode/segment_loader.go:205\ngithub.com/milvus-io/milvus/internal/util/funcutil.ProcessFuncParallel.func3\n\t/go/src/github.com/milvus-io/milvus/internal/util/funcutil/parallel.go:83"]
2023-07-14 14:41:44 [2023/07/14 11:41:44.586 +00:00] [ERROR] [funcutil/parallel.go:85] [loadSegmentFunc] [error="[UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]"] [idx=0] [stack="github.com/milvus-io/milvus/internal/util/funcutil.ProcessFuncParallel.func3\n\t/go/src/github.com/milvus-io/milvus/internal/util/funcutil/parallel.go:85"]
2023-07-14 14:41:44 [2023/07/14 11:41:44.586 +00:00] [DEBUG] [funcutil/parallel.go:51] [loadSegmentFunc] [total=1] ["time cost"=288.075209ms]
2023-07-14 14:41:44 [2023/07/14 11:41:44.586 +00:00] [INFO] [querynode/segment.go:289] ["delete segment from memory"] [collectionID=442848867353100457] [partitionID=442848867353100458] [segmentID=442848867353300479] [segmentType=Growing]
2023-07-14 14:41:44 [2023/07/14 11:41:44.590 +00:00] [WARN] [querynode/watch_dm_channels_task.go:249] ["failed to load segment"] [collection=442848867353100457] [error="[UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]"]
2023-07-14 14:41:44 [2023/07/14 11:41:44.590 +00:00] [INFO] [querynode/shard_cluster.go:185] ["Close shard cluster"] [collectionID=442848867353100457] [channel=by-dev-rootcoord-dml_4_442848867353100457v0] [replicaID=442848906787160065]
2023-07-14 14:41:44 [2023/07/14 11:41:44.590 +00:00] [INFO] [gc/gc_tuner.go:84] ["GC Tune done"] ["previous GOGC"=200] ["heapuse "=55] ["total memory"=360] ["next GC"=136] ["new GOGC"=200] [gc-pause=43.943µs] [gc-pause-end=1689334904588677729]
2023-07-14 14:41:44 [2023/07/14 11:41:44.590 +00:00] [INFO] [querynode/shard_cluster.go:394] ["Shard Cluster update state"] [collectionID=442848867353100457] [channel=by-dev-rootcoord-dml_4_442848867353100457v0] [replicaID=442848906787160065] ["old state"=2] ["new state"=2] [caller=github.com/milvus-io/milvus/internal/querynode.(*ShardCluster).Close.func1]
2023-07-14 14:41:44 [2023/07/14 11:41:44.590 +00:00] [INFO] [querynode/collection.go:153] ["remove vChannel from collection"] [collectionID=442848867353100457] [channel=by-dev-rootcoord-dml_4_442848867353100457v0]
2023-07-14 14:41:44 [2023/07/14 11:41:44.590 +00:00] [WARN] [querynode/impl.go:360] ["failed to subscribe channel"] [collectionID=442848867353100457] [nodeID=519] [channels="[by-dev-rootcoord-dml_4_442848867353100457v0]"] [error="failed to load growing segments, err: [UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]"]
2023-07-14 14:41:44 [2023/07/14 11:41:44.591 +00:00] [WARN] [retry/retry.go:44] ["retry func failed"] ["retry time"=0] [error="failed to load growing segments, err: [UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]"]
2023-07-14 14:41:44 [2023/07/14 11:41:44.591 +00:00] [WARN] [task/executor.go:445] ["failed to subscribe DmChannel"] [taskID=358] [collectionID=442848867353100457] [channel=by-dev-rootcoord-dml_4_442848867353100457v0] [node=519] [source=1] [reason="failed to load growing segments, err: [UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]"]
2023-07-14 14:41:44 [2023/07/14 11:41:44.591 +00:00] [INFO] [task/executor.go:209] ["execute action done, remove it"] [taskID=358] [step=0] [error="failed to subscribe DmChannel[RpcFailed]"]

I've set the following permissions on the bucket:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::{bucket-name}",
                "arn:aws:s3:::{bucket-name}/*"
            ]
        }
    ]
}

Despite this, the issue persists. I also see "insert logs" and "stats logs" folders in the S3 bucket. Is this correct? Should some index objects be written to this bucket as well?

Could someone please help?

Upvotes: 4

Views: 191

Answers (1)

rachel song
rachel song

Reputation: 54

From your log provided, it looks like this error was thrown from calling the s3 interface HeadObject() to get a remote block size. ErrorCode 400 indicates the remote s3 server is not accessible. I would also suggest you try changing the aws region to us-east-2 and see if it helps solve the issue. Because the log shows that headObject is not working, and this might be related to the curl lib used on windows.

Upvotes: 0

Related Questions