flarno11
flarno11

Reputation: 143

App fails to connect to MongoDB Enterprise with "Authentication failed" and Ops Manager stuck in "AdjustUsers"

The credentials of a newly deployed app are rejected by MongoDB with "Authentication failed". The MongoDB Ops Manager is stuck on "AdjustUsers" for a few hours already.


Verified by:

cf service-connector 8080 opsmanager.service.consul:8080

Open Browser at http://localhost:8080 and login using the mongodb service key obtained on the portal:

"ops_manager_url": "http://opsmanager.service.consul:8080",
"ops_manager_user": "xxx",
"ops_manager_password": "xxx",

I've recently restored around 20GB of data, using mongorestore:

cf service-connector 27020 xxx-0.service.consul:33602
mongorestore --gzip --port 27020 --username xxx --password xxx --db rs_xxx /backup/mongodump/ > mongorestore.log 2>&1 &

Other, already existing, apps don't have the problem. Only an app which I deployed using Blue-Green deployment fails to connect:

cf push $APP-new 
cf map-route $APP-new $DOMAIN --hostname $APP
cf unmap-route $APP $DOMAIN --hostname $APP || true
cf unmap-route $APP-new $DOMAIN --hostname $APP-new
cf delete -f   $APP
cf rename      $APP-new $APP

In my apps I don't specify a WriteConcern, so I assume it is only primary ("w: 1")

Upvotes: 3

Views: 1156

Answers (2)

flarno11
flarno11

Reputation: 143

One way to notice that the db binding was not successful is by testing the newly deployed app before un-mapping and deleting the old one (health check on application level):

cf push $APP-new 
# only old app active

# test new app
curl --fail --silent --output /dev/null https://$APP-new.$DOMAIN/status

cf map-route $APP-new $DOMAIN --hostname $APP
# both apps active
cf apps
cf routes

cf unmap-route $APP $DOMAIN --hostname $APP || true
# only new app active

cf unmap-route $APP-new $DOMAIN --hostname $APP-new
cf delete -f   $APP
cf rename      $APP-new $APP

cf apps

This example assumes that the app name is the same as the hostname.

Upvotes: 1

Sybil
Sybil

Reputation: 2592

With every cf bind-service or cf unbind-service the Servicebroker (a Cloud Foundry component) creates new random generated credential (see it with cf env $APP). With our MongoEnterprise service offering, Service Broker connects to Ops Manager API and deploys new User to mongod replicaset.

enter image description here

When Ops Manager deploys user (createUser) he uses WriteConcern "majority", which means at least one secondary needs to acknowledge the write.

COMMAND  [conn53768] command rs_$DBNAME.$cmd command: createUser { createUser: "$USERNAME", pwd: "xxx", digestPassword: false, roles: [ { role: "readWrite", db: "rs_$DBNAME" } ], writeConcern: { w: "majority" } } keyUpdates:0 writeConflicts:0 numYields:0 reslen:138 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { W: 2 } }, Collection: { acquireCount: { w: 2 } } } protocol:op_query 9811ms

Due to the mongorestore, which was performed a couple of hours before this problem occurred, there is a possibility that the createUser command was timing out due to lack of acknowledgement from the secondaries that were busy building indexes in foreground and background. Index build (default mongorestore options) are first done on primary and when done will be in progress in secondaries. This explains the lag rise on secondaries during createUser.

Index builds can take very long time, depending on the index and data size. We heard a lot of customer complains about index build take too long.

Here are logs from index build. In Ops Manager customer can see streamed mongodb.log (opsmanager.service.consul) with a little delay from all replica set members.

...
2017-02-15T10:40:06.199+0000 I INDEX    [repl writer worker 10] build index done.  scanned 1108917 total records. 672 secs
2017-02-15T10:50:08.553+0000 I INDEX    [repl writer worker 4] build index done.  scanned 1108917 total records. 602 secs
2017-02-15T11:01:13.888+0000 I INDEX    [repl writer worker 7] build index done.  scanned 1108917 total records. 665 secs
...
2017-02-15T15:01:37.405+0000 I INDEX    [repl index builder 176] build index done.  scanned 1109531 total records. 659 secs
2017-02-15T15:01:37.406+0000 I INDEX    [repl index builder 170] build index done.  scanned 1109531 total records. 659 secs
2017-02-15T15:16:20.139+0000 I INDEX    [repl index builder 170] build index done.  scanned 1109699 total records. 882 secs

Those are the errors from Automation Agent (Ops Manager speaks via HTTP with Autoamtion Agent on the managed VM, the Automation Agent then speaks with mongod with native protocol)

[2017/02/15 13:33:36.297] [.error] [cm/mongoctl/authctl.go:updateUser26Style:697] [101] <rs_$DB_NAME> [13:33:36.297] Error updating 2.6-style user = AuthUser($USER@rs_$DB_NAME,roles=AuthRoles(1:{"Rolename":"readWrite","Db":"rs_$DB_NAME"}),interned=true,identity=2) in db = rs_$DB_NAME : <rs_$DB_NAME> [13:33:36.297] Timed out (timeout=-1ns) trying to runCommandWithTimeout(dbName=rs_$DB_NAME, cmd=[{"Name":"updateUser","Value":"$USER"},{"Name":"pwd","Value":"$PASSWORD"},{"Name":"digestPassword","Value":false},{"Name":"roles","Value":[{"db":"rs_$DB_NAME","role":"readWrite"}]}])
[2017/02/15 13:33:36.297] [.error] [cm/mongoctl/authctl.go:UpsertUser:516] [101] <rs_$DB_NAME> [13:33:36.297] Error upserting 2.6-style user = AuthUser($USER@rs_$DB_NAME,roles=AuthRoles(1:{"Rolename":"readWrite","Db":"rs_$DB_NAME"}),interned=true,identity=2) : <rs_$DB_NAME> [13:33:36.297] Error updating 2.6-style user = AuthUser($USER@rs_$DB_NAME,roles=AuthRoles(1:{"Rolename":"readWrite","Db":"rs_$DB_NAME"}),interned=true,identity=2) in db = rs_$DB_NAME : <rs_$DB_NAME> [13:33:36.297] Timed out (timeout=-1ns) trying to runCommandWithTimeout(dbName=rs_$DB_NAME, cmd=[{"Name":"updateUser","Value":"$USER"},{"Name":"pwd","Value":"$PASSWORD"},{"Name":"digestPassword","Value":false},{"Name":"roles","Value":[{"db":"rs_$DB_NAME","role":"readWrite"}]}])
[2017/02/15 13:33:36.297] [.error] [cm/action/adjustusers.go:adjustUsers:52] [101] <rs_$DB_NAME> [13:33:36.297] Error upserting user = AuthUser($USER@rs_$DB_NAME,roles=AuthRoles(1:{"Rolename":"readWrite","Db":"rs_$DB_NAME"}),interned=true,identity=2). Trying to proceed though. : <rs_$DB_NAME> [13:33:36.297] Error upserting 2.6-style user = AuthUser($USER@rs_$DB_NAME,roles=AuthRoles(1:{"Rolename":"readWrite","Db":"rs_$DB_NAME"}),interned=true,identity=2) : <rs_$DB_NAME> [13:33:36.297] Error updating 2.6-style user = AuthUser($USER@rs_$DB_NAME,roles=AuthRoles(1:{"Rolename":"readWrite","Db":"rs_$DB_NAME"}),interned=true,identity=2) in db = rs_$DB_NAME : <rs_$DB_NAME> [13:33:36.297] Timed out (timeout=-1ns) trying to runCommandWithTimeout(dbName=rs_$DB_NAME, cmd=[{"Name":"updateUser","Value":"$USER"},{"Name":"pwd","Value":"$PASSWORD"},{"Name":"digestPassword","Value":false},{"Name":"roles","Value":[{"db":"rs_$DB_NAME","role":"readWrite"}]}])
[2017/02/15 13:33:36.381] [.error] [cm/executor/executor.go:ExecutePlan:184] <rs_$DB_NAME> [13:33:36.381] Postcondition failed for step AdjustUsers because
[The value of 'currentState.UsersRight' = false, but it should be true]. Outcome=3

Summary: everything with WriteConcern primary is working fine, but WriteConcern majority times out, du tue blocking index builds.

Upvotes: 2

Related Questions