Alexandru
Alexandru

Reputation: 12922

Why can't I connect to this Service Fabric cluster?

I'm blocked by an error when connecting to a remote service fabric cluster running on premise (not on Azure) using the Connect-ServiceFabricCluster PowerShell command for a network-connected virtual machine:

WARNING: Failed to contact Naming Service. Attempting to contact Failover Manager Service...
WARNING: Failed to contact Failover Manager Service, Attempting to contact FMM...
False
WARNING: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.1.102:19000
Connect-ServiceFabricCluster : No cluster endpoint is reachable, please check if there is connectivity/firewall/DNS issue.
At Install.ps1:3 char:1
+ Connect-ServiceFabricCluster -ConnectionEndpoint "FABRICTESTSRV:19000" -WindowsCred ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [Connect-ServiceFabricCluster], FabricException
    + FullyQualifiedErrorId : TestClusterConnectionErrorId,Microsoft.ServiceFabric.Powershell.ConnectCluster

The command is:

Connect-ServiceFabricCluster -ConnectionEndpoint "FABRICTESTSRV:19000" -WindowsCredential:$True

Why isn't it working?

Here is what I have tried:

Note: This is not an Azure hosted Virtual Machine. This is simply a network-connected virtual machine running Service Fabric Core, vanilla Windows 8.1 x64 fully up to date.

Edit: Get-ServiceFabricClusterManifest reads as follows:

<ClusterManifest xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Name="ComputerName-Local-Cluster" Version=
"1.0" xmlns="http://schemas.microsoft.com/2011/01/fabric">
  <NodeTypes>
    <NodeType Name="NodeType0">
      <Endpoints>
        <ClientConnectionEndpoint Port="19000" />
        <LeaseDriverEndpoint Port="19001" />
        <ClusterConnectionEndpoint Port="19002" />
        <HttpGatewayEndpoint Port="19080" Protocol="http" />
        <HttpApplicationGatewayEndpoint Port="19081" Protocol="http" />
        <ServiceConnectionEndpoint Port="19006" />
        <ApplicationEndpoints StartPort="30001" EndPort="31000" />
      </Endpoints>
    </NodeType>
    <NodeType Name="NodeType1">
      <Endpoints>
        <ClientConnectionEndpoint Port="19010" />
        <LeaseDriverEndpoint Port="19011" />
        <ClusterConnectionEndpoint Port="19012" />
        <HttpGatewayEndpoint Port="19082" Protocol="http" />
        <HttpApplicationGatewayEndpoint Port="19083" Protocol="http" />
        <ServiceConnectionEndpoint Port="19016" />
        <ApplicationEndpoints StartPort="31001" EndPort="32000" />
      </Endpoints>
    </NodeType>
    <NodeType Name="NodeType2">
      <Endpoints>
        <ClientConnectionEndpoint Port="19020" />
        <LeaseDriverEndpoint Port="19021" />
        <ClusterConnectionEndpoint Port="19022" />
        <HttpGatewayEndpoint Port="19084" Protocol="http" />
        <HttpApplicationGatewayEndpoint Port="19085" Protocol="http" />
        <ServiceConnectionEndpoint Port="19026" />
        <ApplicationEndpoints StartPort="32001" EndPort="33000" />
      </Endpoints>
    </NodeType>
    <NodeType Name="NodeType3">
      <Endpoints>
        <ClientConnectionEndpoint Port="19030" />
        <LeaseDriverEndpoint Port="19031" />
        <ClusterConnectionEndpoint Port="19032" />
        <HttpGatewayEndpoint Port="19086" Protocol="http" />
        <HttpApplicationGatewayEndpoint Port="19087" Protocol="http" />
        <ServiceConnectionEndpoint Port="19036" />
        <ApplicationEndpoints StartPort="33001" EndPort="34000" />
      </Endpoints>
    </NodeType>
    <NodeType Name="NodeType4">
      <Endpoints>
        <ClientConnectionEndpoint Port="19040" />
        <LeaseDriverEndpoint Port="19041" />
        <ClusterConnectionEndpoint Port="19042" />
        <HttpGatewayEndpoint Port="19088" Protocol="http" />
        <HttpApplicationGatewayEndpoint Port="19089" Protocol="http" />
        <ServiceConnectionEndpoint Port="19046" />
        <ApplicationEndpoints StartPort="34001" EndPort="35000" />
      </Endpoints>
    </NodeType>
  </NodeTypes>
  <Infrastructure>
    <WindowsServer IsScaleMin="true">
      <NodeList>
        <Node NodeName="_Node_0" IPAddressOrFQDN="localhost" IsSeedNode="true" NodeTypeRef="NodeType0" FaultDomain="fd:/0" UpgradeDomain="0" />
        <Node NodeName="_Node_1" IPAddressOrFQDN="localhost" IsSeedNode="true" NodeTypeRef="NodeType1" FaultDomain="fd:/1" UpgradeDomain="1" />
        <Node NodeName="_Node_2" IPAddressOrFQDN="localhost" IsSeedNode="true" NodeTypeRef="NodeType2" FaultDomain="fd:/2" UpgradeDomain="2" />
        <Node NodeName="_Node_3" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType3" FaultDomain="fd:/3" UpgradeDomain="3" />
        <Node NodeName="_Node_4" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType4" FaultDomain="fd:/4" UpgradeDomain="4" />
      </NodeList>
    </WindowsServer>
  </Infrastructure>
  <FabricSettings>
    <Section Name="Security">
      <Parameter Name="ClusterCredentialType" Value="None" />
      <Parameter Name="ServerAuthCredentialType" Value="None" />
    </Section>
    <Section Name="FailoverManager">
      <Parameter Name="ExpectedClusterSize" Value="4" />
      <Parameter Name="TargetReplicaSetSize" Value="3" />
      <Parameter Name="MinReplicaSetSize" Value="3" />
      <Parameter Name="ReconfigurationTimeLimit" Value="20" />
      <Parameter Name="BuildReplicaTimeLimit" Value="20" />
      <Parameter Name="CreateInstanceTimeLimit" Value="20" />
      <Parameter Name="PlacementTimeLimit" Value="20" />
    </Section>
    <Section Name="ReconfigurationAgent">
      <Parameter Name="ServiceApiHealthDuration" Value="20" />
      <Parameter Name="ServiceReconfigurationApiHealthDuration" Value="20" />
      <Parameter Name="LocalHealthReportingTimerInterval" Value="5" />
      <Parameter Name="IsDeactivationInfoEnabled" Value="true" />
      <Parameter Name="RAUpgradeProgressCheckInterval" Value="3" />
    </Section>
    <Section Name="ClusterManager">
      <Parameter Name="TargetReplicaSetSize" Value="3" />
      <Parameter Name="MinReplicaSetSize" Value="3" />
      <Parameter Name="UpgradeStatusPollInterval" Value="5" />
      <Parameter Name="UpgradeHealthCheckInterval" Value="5" />
      <Parameter Name="FabricUpgradeHealthCheckInterval" Value="5" />
    </Section>
    <Section Name="NamingService">
      <Parameter Name="TargetReplicaSetSize" Value="3" />
      <Parameter Name="MinReplicaSetSize" Value="3" />
    </Section>
    <Section Name="Management">
      <Parameter Name="ImageStoreConnectionString" Value="file:C:\SfDevCluster\Data\ImageStoreShare" />
      <Parameter Name="ImageCachingEnabled" Value="false" />
      <Parameter Name="EnableDeploymentAtDataRoot" Value="true" />
    </Section>
    <Section Name="Hosting">
      <Parameter Name="EndpointProviderEnabled" Value="true" />
      <Parameter Name="RunAsPolicyEnabled" Value="true" />
      <Parameter Name="DeactivationScanInterval" Value="60" />
      <Parameter Name="DeactivationGraceInterval" Value="10" />
      <Parameter Name="EnableProcessDebugging" Value="true" />
      <Parameter Name="ServiceTypeRegistrationTimeout" Value="20" />
      <Parameter Name="CacheCleanupScanInterval" Value="300" />
    </Section>
    <Section Name="HttpGateway">
      <Parameter Name="IsEnabled" Value="true" />
    </Section>
    <Section Name="PlacementAndLoadBalancing">
      <Parameter Name="MinLoadBalancingInterval" Value="300" />
    </Section>
    <Section Name="Federation">
      <Parameter Name="NodeIdGeneratorVersion" Value="V4" />
      <Parameter Name="UnresponsiveDuration" Value="0" />
    </Section>
    <Section Name="ApplicationGateway/Http">
      <Parameter Name="IsEnabled" Value="true" />
    </Section>
    <Section Name="FaultAnalysisService">
      <Parameter Name="TargetReplicaSetSize" Value="3" />
      <Parameter Name="MinReplicaSetSize" Value="3" />
    </Section>
    <Section Name="Trace/Etw">
      <Parameter Name="Level" Value="4" />
    </Section>
    <Section Name="Diagnostics">
      <Parameter Name="ProducerInstances" Value="ServiceFabricEtlFile, ServiceFabricPerfCtrFolder" />
      <Parameter Name="MaxDiskQuotaInMB" Value="10240" />
    </Section>
    <Section Name="ServiceFabricEtlFile">
      <Parameter Name="ProducerType" Value="EtlFileProducer" />
      <Parameter Name="IsEnabled" Value="true" />
      <Parameter Name="EtlReadIntervalInMinutes" Value=" 5" />
      <Parameter Name="DataDeletionAgeInDays" Value="3" />
    </Section>
    <Section Name="ServiceFabricPerfCtrFolder">
      <Parameter Name="ProducerType" Value="FolderProducer" />
      <Parameter Name="IsEnabled" Value="true" />
      <Parameter Name="FolderType" Value="ServiceFabricPerformanceCounters" />
      <Parameter Name="DataDeletionAgeInDays" Value="3" />
    </Section>
    <Section Name="TransactionalReplicator">
      <Parameter Name="CheckpointThresholdInMB" Value="64" />
    </Section>
  </FabricSettings>
</ClusterManifest>

Upvotes: 1

Views: 13261

Answers (2)

cassandrad
cassandrad

Reputation: 3536

Why isn't it working?

It is not working because you set IP address of your nodes as localhost thus making them undiscoverable. It will work for local debug cluster, but for on-premises and for Azure clusters you have to specify valid and reachable IP address or qualified name.

Also, I'm not 100% sure right now, but I can suggest to specify FQDN instead of IP address if you want your cluster be accessible by URI and not by IP. I remember I had troubles with this, but it is still not clear what has helped — FQDN or something else.

Upvotes: 7

Alexandru
Alexandru

Reputation: 12922

There were a few issues, but the biggest, as @cassandrad mentioned, was that the default deployment binds to the TCP FQDN of localhost (IPAddressOrFQDN="localhost") and not the IP address of the machine, so it only allows local connections by default.

Here are complete steps for fixing my issue:

  • I first ran netstat -a | FindStr "19000" in Command Prompt to check what bindings were active, in order to affirm what @cassandrad said.
  • Reading this guide, I decided to download the Service Fabric standalone package for Windows Server (works just fine outside of Windows Server, on Windows 8.1 x64 by the way).
  • I copied and then modified ClusterConfig.Unsecure.DevCluster.json, under the nodes section I changed all nodes' iPAddress to be 192.168.1.102. I called the new file ClusterConfig.Unsecure.CustomDevCluster.json.
  • I ran CreateServiceFabricCluster.ps1. It asked me what JSON configuration to use, so I gave it ClusterConfig.Unsecure.DevCluster.json.
  • The first time it failed because of an error fetching Newtonsoft.JSON version 6.0.0.0, as visible from the traces, which was a rather annoying, obfuscated error. The error was because I did not have .NET Framework 4.6.2, so I downloaded and installed it.
  • The second time it failed because a Microsoft Azure Service Fabric MSI was installed. This error came up because I had previously installed MicrosoftAzure-ServiceFabric-CoreSDK.exe. I went to Programs and Features and uninstalled Microsoft Azure Service Fabric (I left the Microsoft Azure Service Fabric SDK installed).
  • I ran the script one last time, fingers crossed, it finally worked.
  • It is an unsecure cluster, so I was able to simply connect to it using Connect-ServiceFabricCluster "192.168.1.102:19000". If you want to enable other authentication mechanisms, modify and use some of the other .json sample configurations.

Upvotes: 5

Related Questions