Quantcast
Channel: Tomas Fojta – Tom Fojta's Blog
Viewing all 242 articles
Browse latest View live

VMware Cloud Director – Storage IOPS Management – Part II

$
0
0

This is a follow up to the article I posted about a year ago that describes new IOPS management functionality in VMware Cloud Director (VCD) 10.2.

Storage IOPS  is next to compute, networking and storage capacity a limited resource service providers want to manage in order to fairly share underlying physical resources in a multitenant environment.

As was described in the original article VCD supported storage IOPS management  however the feature was quite hidden and available only via API. The recent release of VMware Cloud Director not only fully exposes the functionality in the UI but also adds some new functionality. Let’s dive into it.

There are two main mechanisms now how you can manage IOPS.

vCenter Server managed IOPS

This mechanism relies on setting IOPS limits at storage policy level directly in vCenter Server. That is possible with host based and with vSAN based storage policies. This mechanism is quite simple – when a VM disk is provisioned to such IOPS limited storage policy it will inherit the IOPS limit –  a constant number per policy. You will not be able to set proportional IOPS based on disk capacity.

vSAN Storage Policy with IOPS Limit
Host Based non vSAN Storage Policy with IOPS Limit

I would recommend using such mechanism only if you want to avoid noisy neighbors. The concept is not new, VCD could use such vSAN policies for some time and host based policies were already supported in VCD 10.1. The only difference is that now in 10.2 the tenant will see the limit reservation set at VM disk level but will not be able to change it.

Non-editable Disk IOPS

VCD Managed IOPS

This is much more sophisticated mechanism where you can really manage IOPS as pool of available capacity that you slice and allocate to tenant Org VDCs. This is the mechanism that was until now only available via API.

You will start by tagging your datastores with their IOPS capacity – that has not changed and still must be done from within VC via custom properties.

At Provider VDC level you can then create IOPS managed storage policies and define their service level in terms of disk IOPS defaults, maximums or IOPS allocation based on disk size (0 means unlimited).

This storage policy configuration can be inherited or overridden at Org VDC level. This is big improvement compared to the old approach where you had to create such storage policies always at Org VDC level.

Another new thing is that you can disable IOPS placement mechanism for such storage policy. This is useful in case you want to use Datastore Clusters. VCD will no longer try to place each virtual disk based on a particular datastore available IOPS. The placement decision is instead done by vCenter Server – you should therefore enable Storage DRS with I/O balancing automation. There is no need in such case to tag individual datastores in VC with their IOPS capacity.

Some of the old caveats still apply:

  • Disk IOPS can be assigned only to regular VMs or named (independent) disks, not to VM templates.
  • The disk IOPS will be always allocated against the Org VDC storage profile even if the VM is powered-off. This means the cloud provider can oversubscribe IOPS at the provider VDC storage profile level.
  • System administrator can override IOPS limits when deploying/editing tenant VMs in the system context.

New Networking Features in VMware Cloud Director 10.2

$
0
0

The 10.2 release of VMware Cloud Director from networking perspective was a massive one. NSX-V vs NSX-T gap was closed and in some cases NSX-T backed Org VDCs now provide more networking functionality than the NSX-V backed ones. UI has been redesigned with new dedicated Networking sections however some new features are currently available only in API.
Let me dive straight in so you do not miss any.

NSX-T Advanced Load Balancing (Avi) support

This is a big feature that requires its own blog post. Please read here. In short, NSX-T backed Org VDCs can now consume network load balancer services that are provided by the new NSX-T ALB / Avi.

Distributed Firewall and Data Center Groups

Another big feature combines Cross VDC networking, shared networks and distributed firewall (DFW) functionality. The service provider first must create Compute Provider Scope. This is basically a tag – abstraction of compute fault domains / availability zones and is done either at vCenter Server level or at Provider VDC level.

The same can be done for each NSX-T Manager where you would define Network Provider Scope.

Once that is done, the provider can create Data Center Group(s) for a particular tenant. This is done from the new networking UI in the Tenant portal by selecting one or multiple Org VDCs. The Data Center Group will now become a routing domain with networks spanning all Org VDCs that are part of the group, with a single egress point (Org VDC Gateway) and the distributed firewall.

Routed networks will automatically be added to a Security Group if they are connected to the group Org VDC Edge Gateway. Isolated networks must be added explicitly. An Org VDC can be member of multiple Data Center Groups.

If you want the tenant to use DFW, it must be explicitly enabled and the tenant Organization has to have the correct rights. The DFW supports IP Sets and Security Groups containing network objects that apply rules to all connected VMs.

Note that only one Org VDC Edge Gateway can be added to the Data Center Group. This is due to the limitation that NSX-T logical segment can be attached and routed only via single Tier-1 GW. The Tier-1 GW is in active / standby mode and can theoretically span multiple sites, but only single instance is active at a time (no multi-egress).

VRF-Lite Support

VRF-Lite is an object that allows slicing single NSX-T Tier-0 GW into up to 100 independent virtual routing instances. Lite means that while these instances are very similar to the real Tier-0 GW they do support only subset of its features: routing, firewalling and NATing.

In VCD, when tenant requires direct connectivity to on-prem WAN/MPLS with fully routed networks (instead of just NAT-routed ones), in the past the provider had to dedicated a whole external network backed by Tier-0 GW to such tenant. Now the same can be achieved with VRF which greatly enhances scalability of the feature.

There are some limitations:

  • VRF inherits its parent Tier-0 deployment mode (HA A/A vs A/S, Edge Cluster), BGP local ASN and graceful restart setting
  • all VRFs will share its parent uplinks physical bandwidth
  • VRF uplinks and peering with upstream routers must be individually configured by utilizing VLANs from a VLAN trunk or unique Geneve segments (if upstream router is another Tier-0)
  • an an alternative to the previous point EVPN can be used which allows single MP BGP session for all VRFs and upstream routers with data plane VXLAN encapsulation. Upstream routers obviously must support EVPN.
  • the provider can import into VCD as an external network either the parent Tier-0 GW or its child VRFs, but not both (mixed mode)

IPv6

VMware Cloud Director now supports dual stack IPv4/IPv6 (both for NSX-V and NSX-T backed networks). This must be currently enabled via API version 35 either during network creation or via PUT on the OpenAPI network object by specifying:

“enableDualSubnetNetwork”: true

In the same payload you also have to add the 2nd subnet definition.

 

PUT https://{{host}}/cloudapi/1.0.0/orgVdcNetworks/urn:vcloud:network:c02e0c68-104c-424b-ba20-e6e37c6e1f73

...
    "subnets": {
        "values": [
            {
                "gateway": "172.16.100.1",
                "prefixLength": 24,
                "dnsSuffix": "fojta.com",
                "dnsServer1": "10.0.2.210",
                "dnsServer2": "10.0.2.209",
                "ipRanges": {
                    "values": [
                        {
                            "startAddress": "172.16.100.2",
                            "endAddress": "172.16.100.99"
                        }
                    ]
                },
                "enabled": true,
                "totalIpCount": 98,
                "usedIpCount": 1
            },
            {
                "gateway": "fd13:5905:f858:e502::1",
                "prefixLength": 64,
                "dnsSuffix": "",
                "dnsServer1": "",
                "dnsServer2": "",
                "ipRanges": {
                    "values": [
                        {
                            "startAddress": "fd13:5905:f858:e502::2",
                            "endAddress": "fd13:5905:f858:e502::ff"
                        }
                    ]
                },
                "enabled": true,
                "totalIpCount": 255,
                "usedIpCount": 0
            }
        ]
    }
...
    "enableDualSubnetNetwork": true,
    "status": "REALIZED",
...

 

The UI will still show only the primary subnet and IP address. The allocation of the secondary IP to VM must be either done from its guest OS or via automated network assignment (DHCP, DHCPv6 or SLAAC). DHCPv6 and SLAAC is only available for NSX-T backed Org VDC networks but for NSX-V backed networks you could use IPv6 as primary subnet (with IPv6 pool) and IPv4 with DHCP addressing as the secondary.

To enable IPv6 capability in NSX-T the provider must enable it in Global Networking Config.
VCD automatically creates ND (Neighbor Discovery) Profiles in NSX-T for each NSX-T backed Org VDC Edge GW. And via /1.0.0/edgeGateways/{gatewayId}/slaacProfile API the tenant can set the Edge GW profile either to DHCPv6 or SLAAC. For example:
PUT https://{{host}}/cloudapi/1.0.0/edgeGateways/urn:vcloud:gateway:5234d305-72d4-490b-ab53-02f752c8df70/slaacProfile
{
    "enabled": true,
    "mode": "SLAAC",
    "dnsConfig": {
        "domainNames": [],
        "dnsServerIpv6Addresses": [
            "2001:4860:4860::8888",
            "2001:4860:4860::8844"
        ]
    }
}

And here is the corresponding view from NSX-T Manager:

And finally a view on deployed VM’s networking stack:

DHCP

Speaking of DHCP, NSX-T supports two modes. Network mode (where DHCP service is attached directly to a network and needs an IP from that network) and Edge mode where the DHCP service runs on Tier-1 GW loopback address. VCD now supports both modes (via API only). The DHCP Network mode will work for isolated networks and is portable with the network (meaning the network can be attached or disconnected from the Org VDC Edge GW) without DHCP service disruption. However, before you can deploy DHCP service in Network mode you need to specify Services Edge Cluster (for Edge mode that is not needed as the service runs on the Tier-1 Edge GW).  The cluster definition is done via Network Profile at Org VDC level.

In order to use DHCPv6 the network must be configured in Network mode and attached to Org VDC Edge GW with SLAAC profile configured with DHCPv6 mode.

Other Features

  • vSphere Distributed Switch support for NSX-T segments (also known as Converged VDS), although this feature was already available in VCD 10.1.1+
  • NSX-T IPSec VPN support in UI
  • NSX-T L2VPN support, API only
  • port group backed external networks (used for NSX-V backed Org VDCs) can now have multiple port groups from the same vCenter Server instance (useful if you have vDS per cluster for example)
  • /31 external network subnets are supported
  • Org VDC Edge GW object now supports metadata

NSX-V vs NSX-T Feature Parity

Let me conclude with an updated chart showing comparison of NSX-V vs NSX-T features in VMware Cloud Director 10.2. I highlighted new additions in green.

Quotas and Quota Policies in VMware Cloud Director

$
0
0

In this article I want to highlight a new neat feature in VMware Cloud Director 10.2 – the ability to assign quotas and create quota policies.

This can be done at multiple levels both by service provider or organization administrator.

The following resources today can be managed via quotas:

  • Memory
  • CPU
  • Storage
  • All VMs (includes vApp template VMs)
  • Running VMs
  • TKG Clusters

The list might expand in the future so you can easily find what quota capabilities are available via API.

The service provider can create quotas at the organization level in the Organization > Configure > Quotas section:

The org administrator can assign quota to individual users or groups. This is done from the Administration > Access Control > User or Group  > Set Quota section.
The assignment of a quota at the group level is inherited by each group user (so it is not enforced at the aggregate group level) but can be overridden at the individual user quota level. Also if a user is member of multiple groups the least restrictive combination of participating group quotas will be applied to her.

At the same place the user or org admin can see the actual user’s usage compared to the quota.

Org admins can use quotas to easily control good behavior of org users (not running too many VMs concurrently, not consuming too much storage, etc.), while system admins can set safety quotas at org level when using Org VDC allocation models with unlimited consumption with Pay per use billing.

One hidden feature available only via API is the ability to create more generic quota policies that can combine (pool) multiple quota elements and use those to assign them to organizations, groups or individual users. Think of quota policy: Power User vs Regular User, where the former can power on more VMs.

When a specific quota is assigned at the user/group/org object, quota policy is created in the backend anyway but is specific just to the one object, while edit of Power User quota policy would be applied to every user that has such quota policy.

The feature comes with new specific rights so can be easily enabled or disabled:

  • Organization: Manage Quotas of Organization
  • Organization: Edit Quotas Policy
  • General: View Quota Policy Capabilities
  • General: Manage Quota Policy
  • General: View Quota Policy

NSX-T 3.1: Sharing Transport VLAN between Host and Edge Nodes

$
0
0

When NSX-T 3.1 was released a few days ago, the feature that I was most looking for was the ability to share Geneve overlay transport VLAN between ESXi transport nodes and Edge transport nodes.

Before NSX-T 3.1 in a collapsed design where Edge transport nodes were running on ESXi transport nodes (in other words NSX-T Edge VMs were deployed to NSX-T prepared ESXi cluster) you could not share the same transport (TEP) VLAN unless you would dedicate separate physical uplinks for Edge traffic and ESXi underlay host traffic. The reason is that the Geneve encapsulation/decapsulation was happening only on the physical uplink in/egress and that point would be skipped for intra-host datapath between the Edge and host TEP VMkernel port.

This was quite annoying because the two transport VLANs need to route between each other at full jumbo MTU>1600 frame size. So in lab scenarios you had to have additional router taking care of that. And I have seen multiple time issues due to  misconfigured router MTU size.

After upgrading my lab to NSX-T 3.1 I was eager to test it.

Here are the steps I used to migrate to single transport VLAN:

  1. The collapsed Edge Nodes will need to use trunk uplinks created as NSX-T logical segment. My Edge Nodes used regular VDS port group so I renamed the old ones in vCenter and created new trunks in NSX-T Manager.
  2. (Optional) Create new TEP IP Address Pool for the Edges. You can obviously use the ESXi host IP Pool as now they will share the same subnet, or you can use static IP addressing. I opted for new IP Address Pool with the same subnet as my ESXi host TEP IP Address Pool but a different range so I can easily distinguish host and edge TEP IPs.
  3. Create new Edge Uplink Profile VLAN to match the ESXi transport VLAN.
  4. Now for each Edge node repeat this process: edit the node in the Edge Transport Node Overview tab, change its Uplink Profile, IP Pool and uplinks to the created ones in steps #1, #2 and #3. Refresh and observe the Tunnel health.
  5. Clean up now unused Uplink Profile, IP Pool and VDS uplinks.
  6. Deprovision now unused Edge Transport VLAN from physical switches and from the physical router interface.

During the migration I saw one or two pings to drop but that was it. If you see tunnel issues try to put the edge node briefly into NSX Maintenance Mode.

VMware Cloud Director Cells Behind Internet Proxy

$
0
0

VMware Cloud Director cells are usually deployed in the management cluster and their access to Internet might be limited due to security considerations. This can be a problem because certain features do require outgoing access to external (Internet) resources:

  • Catalog subscription: the cell will need access to the published catalog URL
  • Multisite: if you associate multiple Organizations together, some API calls are fan-out by the cell to the respective associated API endpoints, therefore the cell needs to be able to access them (even its own external API endpoint)
  • Cell Appliance VAMI repository for patches or upgrades

The latest VCD release 10.2.1 now does support internet proxy which means there is no need to have full internet access to the management environment.

On the VCD Appliance the proxy can be configured by editing /etc/sysconfig/proxy file:

 

root@vcloud1 [ ~ ]# cat /etc/sysconfig/proxy
# Enable a generation of the proxy settings to the profile.
# This setting allows to turn the proxy on and off while
# preserving the particular proxy setup.
#
PROXY_ENABLED="yes"

# Some programs (e.g. wget) support proxies, if set in
# the environment.
# Example: HTTP_PROXY="http://proxy.provider.de:3128/"
HTTP_PROXY="http://proxy.fojta.com:3128"

# Example: HTTPS_PROXY="https://proxy.provider.de:3128/"
HTTPS_PROXY="http://proxy.fojta.com:3128"

You need to restart vmware-vcd service to apply the configuration.

Provider Networking in VMware Cloud Director

$
0
0

This is going to be a bit longer than usual and more of a summary / design option type blog post where I want to discuss provider networking in VMware Cloud Director (VCD). By provider networking I mean the part that must be set up by the service provider and that is then consumed by tenants through their Org VDC networking and Org VDC Edge Gateways.

With the introduction of NSX-T we also need to dive into the differences between NSX-V and NSX-T integration in VCD.

Note: The article is applicable to VMware Cloud Director 10.2 release. Each VCD release is adding new network related functionality.

Provider Virtual Datacenters

Provider Virtual Datacenter (PVDC) is the main object that provides compute, networking and storage resources for tenant Organization Virtual Datacenters (Org VDCs). When a PVDC is created it is backed by vSphere clusters that should be prepared for NSX-V or NSX-T. Also during the PVDC creation the service provider must select which Network Pool is going to be used – VXLAN backed (NSX-V) or Geneve backed (NSX-T). PVDC thus can be backed by either NSX-V or NSX-T, not both at the same time or none at all and the backing cannot be changed after the fact.

Network Pool

Speaking of Network Pools – they are used to create on-demand routed/isolated networks by tenants. The Network Pools are independent from PVDCs, can be shared across multiple PVDCs (of the same backing type). There is an option to automatically create VXLAN network pool with PVDC creation but I would recommend against using that as you lose the ability to manage the transport zone backing the pool on your own. VLAN backed network pool can still be created but can be used only in PVDC backed by NSX-V (same for very legacy port group backed network pool now available only via API). Individual Org VDCs can (optionally) override the Network Pool assigned of its parent PVDC.

External Networks

Deploying virtual machines without the ability to connect to them via network is not that usefull. External networks are VCD objects that allow the Org VDC Edge Gateways connect to and thus reach the outside world – internet, dedicated direct connections or provider’s service area. External network have associated one or more subnets and IP pools that VCD manages and uses them to allocate external IP addresses to connected Org VDC Edge Gateways.

There is a major difference how external networks are created for NSX-V backed PVDCs and for NSX-T ones.

Port Group Backed External Network

As the name suggest these networks are backed by an existing vCenter port group (or multiple port groups) that must be created upfront and is usually backed by VLAN (but could be a VXLAN port group as well). These external networks are (currently) supported only in NSX-V backed PVDCs. Org VDC Edge Gateway connected to this network is represented by NSX-V Edge Service Gateway (ESG) with uplink in this port group. The uplinks have assigned IP address(es) of the allocated external IPs.

Directly connected Org VDC network connected to the external network can also be created (only by the provider) and VMs connected to such network have uplink in the port group.

Tier-0 Router Backed External Network

These networks are backed by an existing NSX-T Tier-0 Gateway or Tier-0 VRF (note that if you import to VCD Tier-0 VRF you can no longer import its parent Tier-0 and vice versa). The Tier-0/VRF must be created upfront by the provider with correct uplinks and routing configuration.

Only Org VDC Edge Gateways from NSX-T backed PVDC can be connected to such external network and they are going to be backed by a Tier-1 Gateway. The Tier-1 – Tier-0/VRF transit network is autoplumbed by NSX-T using 100.64.0.0/16 subnet. The allocated external network IPs are not explicitly assigned to any Tier-1 interface. Instead when a service (NAT, VPN, Load Balancer) on the Org VDC Edge Gateway starts using assigned external address, it will be advertised by the Tier-1 GW to the linked Tier-0 GW.

There are two main design options for the Tier-0/VRF.

The recommended option is to configure BGP on the Tier-0/VRF uplinks with upstream physical routers. The uplinks are just redundant point-to-point transits. IPs assigned from any external network subnet will be automatically advertised (when used) via BGP upstream. When provider runs out of public IPs you just assign additional subnet. This makes this design very flexible, scalable and relatively simple.

Tier-0/VRF with BGP

An alternative is to use design that is similar to the NSX-V port group approach, where Tier-0 uplinks are directly connected to the external subnet port group. This can be useful when transitioning from NSX-V to T where there is a need to retain routability between NSX-V ESGs and NSX-T Tier-1 GWs on the same external network.

The picure below shows that the Tier-0/VRF has uplinks directly connected to the external network and a static route towards the internet. The Tier-0 will proxy ARP requests for external IPs that are allocated and used by connected Tier-1 GWs.

Tier-0 with Proxy ARP

The disadvantage of this option is that you waste public IP addresses for T0 uplink and router interfaces for each subnet you assign.

Tenant Dedicated External Network

If the tenant requires direct link via MPLS or a similar technology this is accomplished by creating tenant dedicated external network. With NSX-V backed Org VDC this is represented by a dedicated VLAN backed port group, with NSX-T backed Org VDC it would be a dedicated Tier-0/VRF. Both will provide connectivity to the MPLS router. With NSX-V the ESG would run BGP, with NSX-T the BGP would have to be configured on the Tier-0. In VCD the NSX-T backed Org VDC Gateway can be explicitly enabled in the dedicated mode which gives the tenant (and also the provider) the ability to configure Tier-0 BGP.

There are seprate rights for BGP neighbor configuration and route advertisement so the provider can keep BGP neighbor configuration as provider managed setting.

Note that you can connect only one Org VDC Edge GW in the explicit dedicated mode. In case the tenant requires more Org VDC Edge GWs connected to the same (dedicated) Tier-0/VRF the provider will not enable the dedicated mode and instead will manage BGP directly in NSX-T (as a managed service).

Often used use case is when the provider directly connects Org VDC network to such dedicated external network without using Org VDC Edge GW. This is however currently not possible to do in NSX-T backed PVDC. There instead, you will have to import Org VDC network backed by NSX-T logical segment (overlay or VLAN).

Internet with MPLS

The last case I want to describe is when the tenant wants to access both Internet and MPLS via the same Org VDC Edge GW. In NSX-V backed Org VDC this is accomplished by attaching internet and dedicated external network portgroups to the ESG uplinks and leveraging static or dynamic routing there. In an NSX-T backed Org VDC the provider will have to provision Tier-0/VRF that has transit uplink both to MPLS and Internet. External (Internet) subnet will be assigned to this Tier-0/VRF with small IP Pool for IP allocation that should not clash with any other IP Pools.

If the tenant will have route advertisement right assigned then route filter should be set on the Tier-0/VRF uplinks to allow only the correct prefixes to be advertised towards the Internet or MPLS. The route filters can be done either in NSX-T direclty or in VCD (if the Tier-0 is explicitly dedicated).

The diagram below shows example of an Org VDC that has two Org VDC Edge GWs each having access to Internet and MPLS. Org VDC GW 1 is using static route to MPLS VPN B and also has MPLS transit network accessible as imported Org VDC network, while Org VDC GW 2 is using BGP to MPLS VPN A. Connectivity to the internet is provided by another layer of NSX-T Tier-0 GW which allows usage of overlay segmens as VRF uplinks and does not waste physical VLANs.

One comment on usage of NAT in such design. Usually the tenant wants to source NAT only towards the Internet but not to the MPLS. In NSX-V backed Org VDC Edge GW this is easily set on per uplink interface basis. However, that option is not possible on Tier-1 backed Org VDC Edge GW as it has only one transit towards Tier-0/VRF. Instead NO SNAT rule with destination must be used in conjunction with SNAT rule.

An example:

NO SNAT: internal 10.1.1.0/22 destination 10.1.0.0/16
SNAT: internal 10.1.1.0/22 translated 80.80.80.134

The above example will source NAT 10.1.1.0 network only to the internet.

Google Authentication with VMware Cloud Director (OAuth)

$
0
0

Several authentication mechanisms can be used for VMware Cloud Director users. The basic authentication is used for local (users stored in VCD database) and LDAP users. SAML authentication can be used for integration with SAML compatible Identity Providers such as Microsoft AD FS, IBM Cloud Identity, VMware Workspace ONE Access (VIDM). OAuth authentication is supported as well, but due to the fact you have to (currently as of VCD 10.2) use API to configure it, it is not that widely known.

In this article I will show an example of such configuration with VMware Identity Manager (VIDM) and with Google Identity IdP. Yes, with VIDM you have the option to use SAML or OAuth.

By default OAuth authentication can be enabled by the tenant at Organizational level and co-exist with local, LDAP and SAML identity sources. The OAuth authentication endpoint must be reachable from VCD Cells. This is a big difference compared to SAML authentication, which is performed via assertion token exchange via browser (only the client browser needs to reach the SAML IdP). Therefore OAuth is more suitable when public IdPs are used (e.g. Google) or provider managed ones (VCD cells can reach IdP internally).

VMware Identity Manager OAuth Configuration

Note I am using VIDM version 3.3.

  1. In VIDM as admin go to Catalog, Settings, Remote App Access and create a new Client
  2. Create the client. Pick unique Client ID, the redirect URL is https://vcd.example.com/login/oauth?service=tenant:<org name> or https://vcd.example.com/login/oauth?service=provider. Generate the shared secret and select Email, Profile, User and OpenID scopes.
  3. Now we need to find OAuth endpoints and public key. In my VIDM configuration this is can be found at https://vidm.example.com/SAAS/auth/.well-known/openid-configuration. This URL can differ based on VIDM / Workspace ONE Access version.
    The address returns a JSON response from which we need: issuer, authorization_endpoint, token_endpoint, userinfo_endpoint, scopes and claims supported.
    The link to the public key is provided in jwks_uri (https://vidm.example.com/SAAS/API/1.0/REST/auth/token?attribute=publicKey&format=jwks). We will need the key in PEM format, so you can either convert it (e.g. https://8gwifi.org/jwkconvertfunctions.jsp) or specify PEM format in  the link (&format=pem  at the end of the URI). We will also need KeyID (kid value) and key algorithm (kty).
  4. Now we have all necessary information to configure OAuth in VCD. We will use PUT /admin/org/{id}/settings/oauth API call. In the payload we will provide all data that we collected in steps #2 and #3. Here is an example I used:
    Note the OIDCAttributeMapping section. Here we must specify claims providing more information about the user. VIDM currently does not support groups and roles, so those are hardcoded. You can see what user information is sent by accessing UserInfoEndpoint. This can be done easily with Postman OAuth2 authentication, where you first obtain the Access Token (orange button) and then do a GET against the UserInfoEndpoint.
  5. Lastly we need to import some users. This is done with POST /admin/org/{id}/users API call with ProviderType set to OAUTH.

Now we can log in as the VIDM user.

Google Identity OAuth Configuration

  1. Head over to Credentials section of Google API & Services: https://console.developers.google.com/apis/credentials
  2. Create Project, configure Consent Screen, Scopes and test users
  3. Create OAuth Client ID. Use the redirect URI https://vcd.example.com/login/oauth?service=tenant:<org name> or https://vcd.example.com/login/oauth?service=provider. Note generated Client ID and secret.
  4. Google OAuth endpoints and public keys can be retrieved from: https://accounts.google.com/.well-known/openid-configuration
    You will need to get both public keys and convert them to PEM. Now we can configure the OAUTH in VCD.
PUT https://{{host}}/api/admin/org/b813a16e-6821-4dc5-994f-955b10155107/settings/oauth


<OrgOAuthSettings xmlns="http://www.vmware.com/vcloud/v1.5"                     type="application/vnd.vmware.admin.organizationOAuthSettings+xml">
    <IssuerId>https://accounts.google.com</IssuerId>
    <OAuthKeyConfigurations>
        <OAuthKeyConfiguration>
            <KeyId>eea1b1f42807a8cc136a03a3c16d29db8296daf0</KeyId>
            <Algorithm>RSA</Algorithm>
            <Key>-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA0zNdxOgV5VIpoeAfj8TM
EGRBFg+gaZWz94ePR1yxTKzScHakH4F4wcMEyL0vNE+yW/u4pOl9E+hAalPa2tFv
4fCVNMMkmKwcf0gm9wNFWXGakVQ8wER4iUg33MyUGOWj2RGX1zlZxCdFoZRtshLx
8xcpL3F5Hlh6m8MqIAowWtusTf5TtYMXFlPaWLQgRXvoOlLZ+muzEuutsZRu+agd
OptnUiAZ74e8BgaKN8KNEZ2SqP6vE4w16mgGHQjEPUKz9exxcsnbLru6hZdTDvXb
X9IduabyvHy8vQRZsqlE9lTiOOOC9jwh27TXsD05HAXmNYiR6voekzEvfS88vnot
2QIDAQAB
-----END PUBLIC KEY-----</Key>
        </OAuthKeyConfiguration>
        <OAuthKeyConfiguration>
            <KeyId>03b2d22c2fecf873ed19e5b8cf704afb7e2ed4be</KeyId>
            <Algorithm>RSA</Algorithm>
            <Key>-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEArKZ+1zdz/CoLekSynOty
Wv6cPSSkV28Kb9kZZHyYL+yhkKnH/bHl8OpWiGxQiKP0ulLRIaq1IhSMetkZ8FfX
H+iptIDu4lPb8gt0HQYkjcy3HoaKRXBw2F8fJQO4jQ+ufR4l+E0HRqwLywzdtAIm
NWmju3A4kx8s0iSGHGSHyE4EUdh5WKt+NMtfUPfB5v9/2bC+w6wH7zAEsI5nscMX
nvz1u8w7g2/agyhKSK0D9OkJ02w3I4xLMlrtKEv2naoBGerWckKcQ1kBYUh6WASP
dvTqX4pcAJi7Tg6jwQXIP1aEq0JU8C0zE3d33kaMoCN3SenIxpRczRzUHpbZ+gk5
PQIDAQAB
-----END PUBLIC KEY-----</Key>
        </OAuthKeyConfiguration>
    </OAuthKeyConfigurations>
    <Enabled>true</Enabled>
    <ClientId>**redacted**.apps.googleusercontent.com</ClientId>
    <ClientSecret>**redacted**</ClientSecret>
    <UserAuthorizationEndpoint>https://accounts.google.com/o/oauth2/v2/auth</UserAuthorizationEndpoint>
    <AccessTokenEndpoint>https://oauth2.googleapis.com/token</AccessTokenEndpoint>
    <UserInfoEndpoint>https://openidconnect.googleapis.com/v1/userinfo</UserInfoEndpoint>
    <Scope>email profile openid</Scope>
    <OIDCAttributeMapping>
        <SubjectAttributeName>email</SubjectAttributeName>
        <EmailAttributeName>email</EmailAttributeName>
        <FirstNameAttributeName>given_name</FirstNameAttributeName>
        <LastNameAttributeName>family_name</LastNameAttributeName>
        <GroupsAttributeName>groups</GroupsAttributeName>
        <RolesAttributeName>roles</RolesAttributeName>
    </OIDCAttributeMapping>
    <MaxClockSkew>600</MaxClockSkew>
</OrgOAuthSettings>
[/code]
  • With the same API as described in the step 5 of the VIDM configuration import your OAuth users.

Recovering NSX-T Manager from File System Corruption

$
0
0

One of our labs had a temporary storage issue which left two NSX-T Managers (separate instances of NSX-T installation) in a corrupted state. Here are some steps you can take to attempt to revive the NSX-T Manager appliance back to life. BTW these steps might work for Edge Nodes as well.

The issue starts with the appliance having file system in read only mode. After reboot you will see a message:
UNEXPECTED INCONSISTENCY: RUN fsck Manually

The first step is to go into appliance GRUB menu that appears briefly after start up, hit e key, enter root/VMware1 GRUB credentials (these are different from the regular credentials) and edit the line with starting with linux and replace ro with rw and delete the rest of the line.

Continue with the boot process by pressing Ctrl+x. Hopefully now you are able to get into BusyBox shell and run fsck /dev/sda2 or similar to fix the corrupted partition. Reboot.

What can happen now is that the appliance will boot but again will find LVM corruption and will go into emergency mode and you can see repeated Login incorrect messages.

Repeat the process with the GRUB edit. This time you will be asked to enter root password to go into maintenance mode.

Type the root password and follow this KB article by typing fsck /dev/mapper/nsx-tmp command. Reboot again.

Hopefully now the appliance starts properly.

What can also happen is that your root password expired and you will not be able to enter the maintenance mode. Although the official documentation has a process how to reset it, the process will not work in this case. The workaround is again in the GRUB menu edit the linux line, replace ro with rw but then append init=/bin/bash. You should be able to get to the shell and reset your password with passwd command.

Good luck with the recovery and do not forget to set up backup and disable password expiration.


VMware Cloud Provider Lifecycle Manager

$
0
0

VMware Cloud Provider Lifecycle Manager is a new product just released in version 1.1. The version 1.0 was not generaly available and thus not widely known. Let me therefore briefly describe what it is and what it can do.

As the name indicates its main goal is to simplify deployment and lifecycle of VMware’s Cloud Provider solutions. Currently in scope are:

  • VMware Cloud Director (10.1.x or 10.2.x)
  • Usage Meter (4.3 and 4.4)
  • vRealize Operations Tenant App (2.4 and 2.5)
  • RabbitMQ (Bitnami based)

The product itself ships as a stateless Docker image that can be deployed as a container for example in Photon OS VM. It has no GUI, but provides REST API. The API calls support the following actions:

  • Deployment of an environment that can consist of one or more products (VCD, UM, …)
  • Upgrade of an environment and product
  • Certificate management
  • Node managment (adding, removing, redeploying nodes)
  • Integration management (integration of a specific products with others)

The image below shows most of the Postman Collection API calls available:

The whole environment (or its product subset) is described in JSON format that is supplied in the API payload. The example below shows payload to deploy VCD with three cells, includes necessary certificates, target vSphere environment and integration with vSphere, NSX-T and RabbitMQ including creation of Provider VDC.

{
    "environmentName": "{{vcd_env_id}}",
    "products": [
        {
            "properties": {
                "installationId": 1,
                "systemName": "vcd-1-vms",
                "dbPassword": "{{password}}",
                "keystorePassword": "{{password}}",
                "clusterFailoverMode": "MANUAL",
                "publicAddress": {
                    "consoleProxyExternalAddress": "{{vcd_lb_ip}}:8443",
                    "restApiBaseHttpUri": "http://{{vcd_lb_ip}}",
                    "restApiBaseUri": "https://{{vcd_lb_ip}}",
                    "tenantPortalExternalHttpAddress": "http://{{vcd_lb_ip}}",
                    "tenantPortalExternalAddress": "https://{{vcd_lb_ip}}"
                },
                "adminEmail": "admin@vcd-test.com",
                "adminFullName": "admin",
                "nfsMount": "{{vcd_nfs_mount}}"
            },
            "certificate": {
                "product": {
                    "certificate": "{{vcd_cert}}",
                    "privateKey": "{{vcd_cert_key}}"
                },
                "restApi": {
                    "certificate": "{{vcd_cert}}"
                },
                "tenantPortal": {
                    "certificate": "{{vcd_cert}}"
                }
            },
            "integrations": [
                {
                    "integrationId": "vcd-01-to-vc-01",
                    "datacenterComponentType": "VCENTER",
                    "hostname": "{{vcenter_hostname}}.{{domainName}}",
                    "integrationUsername": "administrator@vsphere.local",
                    "integrationPassword": "{{vc_password}}",
                    "properties": {
                        "providerVdcs": {
                                "PVDC-1": {
                                "description": "m01vc01-comp-rp",
                                "highestSupportedHardwareVersion": "vmx-14",
                                "isEnabled": true,
                                "clusterName": "{{vc_cluster}}",
                                "resourcePoolname": "{{pvdc_resource_pool}}",
                                "nsxIntegration": "vcd-01-to-nsx-01",
                                "storageProfile":[
                                    "{{pvdc_storage_profile}}"
                                ],
                                "networkPoolname":"NP-1"
                            }
                        }
                    }
                },
                {
                    "integrationId": "vcd-01-to-nsx-01",
                    "datacenterComponentType": "NSXT",
                    "hostname": "{{nsxt_hostname}}.{{domainName}}",
                    "integrationUsername": "admin",
                    "integrationPassword": "{{nsx_password}}",
                    "properties": {
                        "networkPools": {
                            "NP-1": "{{pvdc_np_transport_zone}}"
                        },
                        "vcdExternalNetworks": {
                            "EN-1": {
                                "subnets": [
                                    {
                                        "gateway": "192.168.91.1",
                                        "prefixLength": 24,
                                        "dnsServer1": "",
                                        "ipRanges":  [
                                            {
                                                "startAddress": "192.168.91.150",
                                                "endAddress": "192.168.91.200"
                                            }
                                        ]
                                    }
                                ],
                                "description": "ExternalNetworkCreatedViaVCDBringup",
                                "tier0Name": "{{pvdc_ext_nw_tier0_gw}}"
                            }
                        }
                    }
                },
                {
                    "integrationId": "vcd-01-to-rmq-01",
                    "productType": "RMQ",
                    "hostname": "{{rmq_lb_name}}.{{domainName}}",
                    "port": "{{rmq_port_amqp_ssl}}",
                    "integrationUsername": "svc_vcd",
                    "integrationPassword": "{{password}}",
                    "properties": {
                        "amqpExchange": "systemExchange",
                        "amqpVHost": "/",
                        "amqpUseSSL": true,
                        "amqpSslAcceptAll": true,
                        "amqpPrefix": "vcd"
                    }
                }
            ],
            "productType": "VCD",
            "productId": "{{vcd_product_id}}",
            "version": "10.1.2",
            "license": "{{vcd_license}}",
            "adminPassword": "{{password}}",
            "nodes": [
                {
                    "hostName": "{{vcd_cell_1_name}}.{{domainName}}",
                    "vmName": "{{vcd_cell_1_name}}",
                    "rootPassword": "{{password}}",
                    "gateway": "{{vcd_mgmt_nw_gateway}}",
                    "nics": [
                        {
                            "ipAddress": "{{vcd_cell_1_ip}}",
                            "networkName": "vcd-dmz-nw",
                            "staticRoutes": []
                        }, {
                            "ipAddress": "{{vcd_cell_1_mgmt_ip}}",
                            "networkName": "vcd-mgmt-nw",
                            "staticRoutes": []
                        }
                    ]
                },
                {
                    "hostName": "{{vcd_cell_2_name}}.{{domainName}}",
                    "vmName": "{{vcd_cell_2_name}}",
                    "rootPassword": "{{password}}",
                    "gateway": "{{vcd_mgmt_nw_gateway}}",
                    "nics": [
                        {
                            "ipAddress": "{{vcd_cell_2_ip}}",
                            "networkName": "vcd-dmz-nw",
                            "staticRoutes": []
                        }, {
                            "ipAddress": "{{vcd_cell_2_mgmt_ip}}",
                            "networkName": "vcd-mgmt-nw",
                            "staticRoutes": []
                        }
                    ]
                }
            ]
        }
    ],
    "deploymentInfrastructures": {
        "infra1": {
            "vcenter": {
                "vcenterName": "mgmt-vc",
                "vcenterHost": "{{vcenter_hostname}}.{{domainName}}",
                "vcenterUsername": "administrator@vsphere.local",
                "vcenterPassword": "{{vc_password}}",
                "datacenterName": "{{vc_datacenter}}",
                "clusterName": "{{vc_cluster}}",
                "resourcePool": "{{vc_res_pool}}",
                "datastores": [
                    "{{vc_datastore}}"
                ],
                "networks": {
                    "vcd-dmz-nw": {
                        "portGroupName": "{{vcd_dmz_portgroup}}",
                        "gateway": "{{vcd_dmz_gateway}}",
                        "subnetMask": "{{vcd_dmz_subnet}}",
                        "domainName": "{{domainName}}",
                        "searchPath": [
                            "{{domainName}}"
                        ],
                        "useDhcp": false,
                        "dns": [
                            "{{dns}}"
                        ],
                        "ntp": [
                            "{{ntp}}"
                        ]
                    },
                    "vcd-mgmt-nw": {
                        "portGroupName": "{{vcd_mgmt_nw_portgroup}}",
                        "gateway": "{{vcd_mgmt_nw_gateway}}",
                        "subnetMask": "{{vcd_mgmt_nw_subnet}}",
                        "useDhcp": false
                    }
                }
            }
        }
    }
}

The JSON payload structure is similar for other products. It starts with the environment definition and then follows with a specific product and its product type (VCD, RMQ, TenantApp, Usage Meter). Each has its own set of properties. Integrations section defines for example which tenant VC and NSX should be registered, RabbitMQ etc. Then follows the description of each node to be deployed while referring to Deployment Infrastructure section that is at the end of the JSON and describes the vSphere environent where the nodes can be deployed.

During the bring up the Lifecycle Manager will perform various set of tests and validations to see if the payload is correct and if the referenced environments are accessible. Then it will go on with the actual deployment process. For that it needs to have access to file repository of OVA images (for the bring up) or patch/upgrade files (for lifecycle). This must be manually downloaded to the Docker VM or mounted via NFS.

For the day 2 operations (certificate changes, node manipulations, etc.) an environment must first be imported (as mentioned before the Lifecycle Manager is stateless and forgets everything when rebooted). During the import the same payload as for deployment is provided and checks are performed that the actual environment matches the imported one. Once the state is in the container memory day 2 command can be run. And a six cell VMware Cloud Director deployment can be upgraded with a single API call!

The actual architecture of the deployment is quite flexible. The Lifecycle Manager itself does not prescribe or deploys any networks, load balancers or NFS shares. All those must be prepared up front. I have tested deployment on top of VMware Cloud Foundation 4 (see here) but that is not a hard requirement. Brown field environments are not supported, but nothing is really stopping you to try to describe your existing environment in the JSON and import it.

If you plan to deploy and manage VMware Cloud Director at scale give it a try. And as this is the first public release we have a lot to look forward in the future.

How to Migrate VMware Cloud Director from NSX-V to NSX-T (part 2)

$
0
0

This is an update of the original article with the same title published last year How to Migrate VMware Cloud Director from NSX-V to NSX-T and includes the new enhancements of the VMware NSX Migration for VMware Cloud Director version 1.2.1 that has been released yesterday.

The tool’s main purpose is to automate migration of VMware Cloud Director Organization Virtual Data Centers that are NSX-V backed to a NSX-T backed Provider Virtual Data Center. The original article describes how exactly it is accomplished and what is the impact of migrated workloads from the networking and compute perspective.

The migration tool is continually developed and additional features are added to either enhance its usability (improved roll back, simplified L2 bridging setup) or to support more use cases based on new features in VMware Cloud Director (VCD). And then there is a new assessment mode! Let me go into more details.

Directly Connected Networks

The VCD release 10.2.2 added support to use in NSX-T backed Org VDCs directly connected Organization VDC networks. Such networks are not connected to a VDC Gateway and instead are just connected directly to a port group backed external network. The typical usage is for service networks, backup networks or colocation/MPLS networks where routing via the VDC Gateway is not desired.

The migration tool now supports migration of these networks. Let me describe how it is done.

The VCD external network in NSX-V backed PVDC is port group backed. It can be backed by one or more port groups that are typically manually created VLAN port groups in vCenter Server or they can also be VXLAN backed (system admin would create NSX-V logical switch directly in NSX-V and then use its DVS port groups for the external network). The system administrator then can create in the Org VDC a directly connected network that is connected to this external network. It inherits its parent’s IPAM (subnet, IP pools) and when tenant connects a VM to it it is just wired to the backing port group.

The migration tool first detects if the migrated Org VDC direct network is connected to an external network that is also used by other VDCs and based on that behaves differently.

Colocation / MPLS use case

If the external network is not used by any other Org VDC and the backing port group(s) is VLAN type (if more port groups are used they must have the same VLAN), then it will create in NSX-T logical segment in VLAN transport zone (specified in the YAML input spec) and import it to the target Org VDC as imported network. The reason why direct connection to external network is not used is to limit the external network sprawl as the import network feature perfectly matches the original use case intent. After the migration the source external network is not removed automatically and the system administrator should clean them up including the vCenter backing port groups at their convenience.

Note that no bridging is performed between the source and target network as it is expected the VLAN is trunked across source and target environments.

The diagram below shows the source Org VDC on the left and the target one on the right.

Service Network Use Case

If the external network is used by other Org VDCs, the import VLAN segment method cannot be used as each imported Org VDC network must be backed by its own logical segment and has its own IPAM (subnet, pool). In this case the tool will just create directly connected Org VDC network in the target VDC connected to the same external network as the source. This requires that the external network is scoped to the target PVDC – if the target PVDC is using different virtual switch you will need first to create regular VLAN backed port group there and then add it to the external network (API only currently). Also only VLAN backed port group can be used as no bridging is performed for such networks.

Assessment Mode

The other big feature is the assessment mode. The main driver for this feature is to enable service providers to see how much ready their environment is for the NSX-V to T migration and how much redisign will be needed. The assessment can be triggered against VCD 10.0, 10.1 or 10.2 environments and only requires VCD API access (the environment does not yet need to be prepared for NSX-T).

The tool will during the assessment check all or specified subset of NSX-V backed Org VDCs and assess every feature used there that impacts its migration viability. Then it will provide detailed and summarized report where you can see what ratio of the environment *could* be migrated (once upgraded to the latest VCD 10.2.2). This is provided in Org VDC, VM and used RAM units.

The picture below shows example of the summary report:

Note that if there is one vApp in a particular Org VDC that cannot be migrated, the whole Org VDC is counted as not possible to migrate (in all metrics VM and RAM). Some features are categorized as blocking – they are simple not supported by either NSX-T backed Org VDC or the migration tool (yet), but some issues can be mitigated/fixed (see the remediation recommendations in the user guide).

Conclusion

As mentioned the migration tool is continuosly developed and improved. Together with the next VMware Cloud Director version we can expect additional coverage of currently unsupported features. Especially the shared network support is high on the radar.

New Networking Features in VMware Cloud Director 10.3

$
0
0

The previous VMware Cloud Director 10.2 release brought many new networking features, the current one 10.3 continuous in the same fashion. Let me give you a brief run down.

UI Enhancements

The UI has been enhanced to surface formerly API only features such as the ability to configure dual stack IPv4/IPv6 networks:

or configure DHCP in gateway or network mode:

The service provider can now assign/change primary IP address of Org VDC Edge Gateway in the UI:

The Org VDC NSX-T Edge Cluster that is used to deploy DHCP service in network mode and vApp Edges (more on them later) can be set in the UI (previously Network Profile API > Services Edge Cluster).

It is also possible to configure (extend) an external network port group backing without using API.


New NSX-T Backed Provider VDC Features

As NSX-T backed PVDCs now support both Tier-0/VRF and port group backing for external networks, to avoid confusion the Tier-0/VRF GWs were separated into its own tab.

The port group backed external networks can be either traditional VDS port groups, or NSX-T segments. The latter option gives the ability to use NSX-T distributed firewall on such external network (provider managed directly in NSX-T).

Distributed Firewall now supports dynamic groups that can be defined utilizing VM Tag or VM name.

vApps support routed vApp networks including DHCP service on vApp isolated networks. This is achieved by deploying standalone Tier-1 GWs that are connected to Org VDC networks via service interface. The Org VDC network must be overlay backed (not VLAN). vApp fencing is still not supported as NSX-T does not provide this functionality.

A few additional small enhancements ranging from support for Guest VLAN tagging, reflexive NAT to DHCP pool management.

Provider VDC with no NSX

The creation of Provider VDC does not require network pool specification anymore. Such PVDC will thus not provide any NSX-V or T features (routing, DHCP, firewalling, load balancing). The Org VDC network can than be backed by VLAN network pool or use VDS backed imported networks.

NSX-V vs NSX-T Feature Parity

Let me conclude with traditional NSX-V / NSX-T VCD feature comparison chart (new updates highlighted in green).

How to Unregister NSX-V Manager from VMware Cloud Director

$
0
0

After successful migration from NSX-V to NSX-T in VMware Cloud Director you might wish to unregister NSX-V Manager and completely delete it from the vCenter. This is not so easy as the whole VCD model was build on the assumption that vCenter Server and NSX-V Manager are tied together and thus retired together as well. This is obviously no longer the case as you can now use NSX-T backed PVDCs or not use NSX at all (new feature as of VCD 10.3).

VMware Cloud Director adds API support to unregister NSX-V Manager without removing the vCenter Server. To do so you need to use the OpenAPI PUT VirtualCenter call. You will first have to run GET call with the VC URN to retrieve its current configuration payload, remove the nsxVManager element and then PUT it back.

Example:

GET https://{{host}}/cloudapi/1.0.0/virtualCenters/urn:vcloud:vimserver:cd0471d4-e48f-4669-8852-de1fdd2648aa

Response:

{
    "vcId": "urn:vcloud:vimserver:cd0471d4-e48f-4669-8852-de1fdd2648aa",
    "name": "vc-01a",
    "description": "",
    "username": "vcd@vsphere.local",
    "password": "******",
    "url": "https://vc-01a.corp.local",
    "isEnabled": true,
    "vsphereWebClientServerUrl": null,
    "hasProxy": false,
    "rootFolder": null,
    "vcNoneNetwork": null,
    "tenantVisibleName": "Site A",
    "isConnected": true,
    "mode": "IAAS",
    "listenerState": "CONNECTED",
    "clusterHealthStatus": "GREEN",
    "vcVersion": "7.0.0",
    "buildNumber": null,
    "uuid": "1da63a23-534a-4315-b3fa-29873d542ae5",
    "nsxVManager": {
        "username": "admin",
        "password": "******",
        "url": "http://192.168.110.24:443",
        "softwareVersion": "6.4.8"
    },
    "proxyConfigurationUrn": null
}

PUT https://{{host}}/cloudapi/1.0.0/virtualCenters/urn:vcloud:vimserver:cd0471d4-e48f-4669-8852-de1fdd2648aa

{
    "vcId": "urn:vcloud:vimserver:cd0471d4-e48f-4669-8852-de1fdd2648aa",
    "name": "vc-01a",
    "description": "",
    "username": "vcd@vsphere.local",
    "password": "******",
    "url": "https://vc-01a.corp.local",
    "isEnabled": true,
    "vsphereWebClientServerUrl": null,
    "hasProxy": false,
    "rootFolder": null,
    "vcNoneNetwork": null,
    "tenantVisibleName": "Site A",
    "isConnected": true,
    "mode": "IAAS",
    "listenerState": "CONNECTED",
    "clusterHealthStatus": "GREEN",
    "vcVersion": "7.0.0",
    "uuid": "1da63a23-534a-4315-b3fa-29873d542ae5",
    "proxyConfigurationUrn": null
}

In order for the NSX-V Manager removal to succeed you must make sure that:

  • Org VDCs using the vCenter Server do not have any NSX-V objects (VXLAN networks, Edge Gateways, vApp or DHCP Edges)
  • Org VDCs using the vCenter Server do not use VXLAN network pool
  • There is no VXLAN network pool managed by the to-be-removed NSX-V Manager

If all above is satistfied you will not need to remove existing Provider VDCs (if they were in the past using NSX-V). They will become NSX-less (so you will not be able to use NSX-T objects in them). NSX-T backed PVDCs will not be affected at all.

Layer 2 VPN to the Cloud – Part III

$
0
0

I feel like it is time for another update on VMware Cloud Director (VCD) capabilities regarding establishing L2 VPN between on-prem location and Org VDC. The previous blog posts were written in 2015 and 2018 and do not reflect changes related to usage of NSX-T as the underlying cloud network platform.

The primary use case for L2 VPN to the cloud is migration of workloads to the cloud when the L2 VPN tunnel is temporarily established until migration of all VMs on single network is done. The secondary use case is Disaster Recovery but I feel that running L2 VPN permanently is not the right approach.

But that is not the topic of today’s post. VCD does support setting up L2 VPN on tenant’s Org VDC Gateway (Tier-1 GW) from version 10.2 however still it is hidden, API-only feature (the GUI is finally coming soon … in VCD 10.3.1). The actual set up is not trivial as the underlying NSX-T technology requires first IPSec VPN tunnel to be established to secure the L2 VPN client to server communication. VMware Cloud Director Availability (VCDA) version 4.2 is an addon disaster recovery and migration solution for tenant workloads on top of VCD and it simplifies the set up of both the server (cloud) and client (on-prem) L2 VPN endpoints from its own UI. To reiterate, VCDA is not needed to set up L2 VPN, but it makes it much easier.

The screenshot above shows the VCDA UI plugin embeded in the VCD portal. You can see three L2 VPN session has been created on VDC Gateway GW1 (NSX-T Tier-1 backed) in ACME-PAYG Org VDC. Each session uses different L2 PVN client endpoint type.

The on-prem client can be existing NSX-T tier-0 or tier-1 GW, NSX-T autonomous edge or standalone Edge client. And each requires different type of configuration, so let me discuss each separately.

NSX-T Tier-0 or Tier-1 Gateway

This is mostly suitable for tenants who are running existing NSX-T environment on-prem. They will need to set up both IPSec and L2VPN tunnels directly in NSX-T Manager and is the most complicated process of the three options. On either Tier-0 or Tier-1 GW they will first need to set up IPSec VPN and L2 VPN client services, then the L2VPN session must be created with local and remote endpoint IPs and Peer Code that must be retrieved before via VCD API (it is not available in VCDA UI, but will be available in VCD UI in 10.3.1 or newer). The peer code contains all necessary configuration for the parent IPSec session in Base64 encoding.

Lastly local NSX-T segments to be bridged to the cloud can be configured for the session. The parent IPSec session will be created automagically by NSX-T and after while you should see green status for both IPSec and L2 VPN sessions.

Standalone Edge Client

This option leverages the very light (150 MB) OVA appliance that can be downloaded from NSX-T download website and actually works both with NSX-V and NSX-T L2 VPN server endpoints. It does not require any NSX installation. It provides no UI and its configuration must be done at the time of deployment via OVF parameters. Again the peer code must be provided.

Autonomous Edge

This is the prefered option for non-NSX environments. Autonomous edge is a regular NSX-T edge node that is deployed from OVA, but is not connected to NSX-T Manager. During the OVA deployment Is Autonomous Edge checkbox must be checked. It provides its own UI and much better performance and configurability. Additionally the client tunnel configuration can be done via the VCDA on-premises appliance UI. You just need to deploy the autonomous edge appliance and VCDA will discover it and let you manage it from then via its UI.

This option requires no need to retrieve the Peer Code as the VCDA plugin will retrieve all necessary information from the cloud site.

Marking Devices as SSD in vSphere 7 via CLI

$
0
0

Mostly as a note to myself, on the CLI commands that allow marking storage devices as SSD so they can be used for vSAN (useful for nested environments).

esxcli storage hpp device set -d mpx.vmhba0:C0:T0:L0 -M true
esxcli storage hpp device set -d mpx.vmhba0:C0:T1:L0 -M true
esxcli storage hpp device usermarkedssd list

Upgrading VMware Cloud Director with Single API Call

$
0
0

Today I have upgraded two VMware Cloud Director environments to version 10.3.2 each with 3 appliances with two API calls. All that thanks to VMware Cloud Lifecycle Manager.

curl --location --request PUT 'https://172.28.59.10:9443/api/v1/lcm/environment/vcd-env-2/product/vcd-1/upgrade?action=UPGRADE' \
--header 'Content-Type: application/json' \
--header 'JSESSIONID: 4E908BE08C282AF45B1CF5BB6736FE32' \
--data-raw '{
    "upgradeDetails": {
        "targetVersion": "10.3.2",
        "additionalProperties": {
            "keepBackup": true
        }
    }
}'

As I have blogged about the VMware Cloud Provider Lifecycle Manager (VCP LCM) in the past I just want to highlight how it handles frequent updates of the solutions it manages. VCP LCM is now in version 1.2 and deployed as an appliance. It is update about twice a year. However when one of the solution that it manages has a new update (VMware Cloud Director, Usage Meter, Tenant App) a small LCM interop update bundle is released (VCP LCM download page, Driver and Tools section) that provides support for update of the newly released solution(s). That way there is no lag or need to wait for new (big) VCP LCM release.

So in my case all I had to do was just download and apply (unzip and execute) the new LCM interop bundle, download the VCD 10.3.2 update file to my VCP LCM repo (NFS) and trigger the API update call mentioned above.

The interop bundle(s) are versioned independently from the VCP LCM itself, are cumulative and do check if the actual underlying VCP LCM will suport the bundle (for example LCM Interop bundle 1.2.1 can be installed on top of VCP LCM 1.2 or 1.2.0.1 but not on 1.1). This can be seen in the interop_bundle_version.properties file (inside .lcm zipped file).

product.version=1.2.0,1.2.0.1
vcplcm_interop_bundle.build_number=19239142
vcplcm_interop_bundle.version=1.2.1

I should mention that VCP LCM only supports environments that it created. It does have import functionality, but that is to import existing VCP LCM deployed environments as it does not (currently) keep their state when it is rebooted.

So what is actually happening when the update is triggered with the API call? In a high level: VCP LCM will first check that the to be updated environment (VCD installation) is running properly, that it can access all its cells, etc. Then it will shut down the VCD service and database and create snapshot of all cells for quick roll back if anything goes wrong. Then it restarts the database and creates regular backup which is saved to VCD transfer share. Update binaries are then uploaded and executed on every cell followed by database schema upgrade. Cells are rebooted and checks are performed that VCD came up properly with the correct version. If so snapshots can be removed and optionally the regular backup as well.

Happy upgrades!


VMware Cloud Director Troubleshooting

$
0
0

Recently I participated in Feature Friday Youtube show that discussed VMware Cloud Director troubleshooting. You can watch it here:

vROps Tenant App Upgrade Issue

$
0
0

While performing vROps Tenant App 2.6.2 upgrade in my lab I have encounter the following error:
Failed to install updates(Error while running installation tests).

Quick check of the /opt/vmware/var/log/vami/updatecli.log shows that the appliance is running out of free space on the root / partition.

24/02/2022 15:01:34 [INFO] Running /opt/vmware/var/lib/vami/update/data/job/32/test_command
Verifying packages…
Preparing packages…
installing package tenant-app-8.6.0-18724818.noarch needs 1231MB on the / filesystem
24/02/2022 15:01:41 [ERROR] Failed with exit code 56576

The reason why this is happening is that the tenant app runs as a docker container and the older versions have not been purged. I can see in my particular case I have above 7 GB of docker images on the filesystem:

root@tenantapp [ /var/lib/docker/overlay2 ]# du -h -d 0
7.5G    .

/var/lib/docker/overlay2 ]# docker image ls
REPOSITORY                                 TAG                 IMAGE ID            CREATED             SIZE
vmware/vrops-vcd-tenant-app-db-cassandra   2.6.2-19235005      057345d369fd        5 weeks ago         634MB
vmware/vrops-vcd-tenant-app-db-cassandra   latest              057345d369fd        5 weeks ago         634MB
vmware/vrops-vcd-tenant-app-ui             2.6.2-19235005      4e90d15d3116        5 weeks ago         396MB
vmware/vrops-vcd-tenant-app-ui             latest              4e90d15d3116        5 weeks ago         396MB
vmware/vrops-vcd-tenant-app-plugin         2.6.2-19235004      de4cb469fb65        5 weeks ago         309MB
vmware/vrops-vcd-tenant-app-plugin         latest              de4cb469fb65        5 weeks ago         309MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.6.1-18326916      3b7ef9b0c10c        7 months ago        597MB
vmware/vrops-vcd-tenant-app-ui             2.6.1-18326916      b66e34b5d59b        7 months ago        368MB
vmware/vrops-vcd-tenant-app-plugin         2.6.1-18326915      f97bc56c3d61        7 months ago        286MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.6.0-17922920      0d5eb9de1cb7        10 months ago       581MB
vmware/vrops-vcd-tenant-app-ui             2.6.0-17922920      3ffdeee597ca        10 months ago       354MB
vmware/vrops-vcd-tenant-app-plugin         2.6.0-17922919      b23bd4eb6a2d        10 months ago       268MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.5.0-16990343      af72dbf16623        16 months ago       536MB
vmware/vrops-vcd-tenant-app-ui             2.5.0-16990343      62b09bd2a0a2        16 months ago       252MB
vmware/vrops-vcd-tenant-app-plugin         2.5.0-16941875      1217f67efd9d        17 months ago       190MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.4.0-15996298      a0d906a5cc5a        22 months ago       494MB
vmware/vrops-vcd-tenant-app-ui             2.4.0-15996298      777fe7bc0c1f        22 months ago       240MB
vmware/vrops-vcd-tenant-app-plugin         2.4.0-15996297      b85369dbf061        22 months ago       180MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.3.0-14826918      556121e468da        2 years ago         466MB
vmware/vrops-vcd-tenant-app-ui             2.3.0-14826918      eb77c613e9ad        2 years ago         224MB
vmware/vrops-vcd-tenant-app-plugin         2.3.0-14826917      e598e66d4818        2 years ago         158MB

After checking with Tenant App engineering, the problem has been fixed in the newest (8.6.1) version that does purge the old images upon successful upgrade. But if you hit the issue you will need to cleanup the old images with the follwing command:

docker image rm -f <image ID>

BTW if you delete wrong images you can always recreate them with the following commands:

docker load -i /opt/vmware/app/vrops-vcd-tenant-app-ui.tar.gz
docker load -i /opt/vmware/db/vrops-vcd-tenant-app-db-cassandra.tar.gz
docker load -i /opt/vmware/plugin/vrops-vcd-tenant-app-plugin.tar.gz

How to Move (Live) vApps Across Org VDCs

$
0
0

VMware Cloud Director has secret not well known API only feature that allows to move vApps across Org VDCs while they are running. This feature has been purposefully made for the NSX-V to NSX-T Migration Tool, but can be used for other use cases hence the reason here to shed more lights on it.

We should start with mentioning that vApp migration across Org VDCs has been around since forever – in the UI you can select an existing vApp and you will find out Move command in the action menu. But that is something completely different – that method does in the background (vSphere) cloning operation with deletion of the source VM(s). Thus it is slow, requires vApp to be powered off and creates new identity for the vApp and VMs after the move (their UUIDs will change). The UI is using API method POST /vdc/{id}/action/cloneVApp with flag IsSourceDelete set to true.

So the above method is *not* the subject of this article – instead we will talk about API method POST /VDC/{id}/action/moveVApp.

The main differences are:

  • vMotion (e.g. live, share nothing and cross vCenter) is used
  • identity of vApp and VM does not change (UUID is retained)
  • vApp can be in running state
  • VMs can be connected to Named (independent) disks
  • Fast provisioning (linked clones) support

The moveVApp API is fairly new and still evolving. For example VMware Cloud Director 10.3.2 added support for move router vApps. Movement of running encrypted vApps will be supported in the future. So be aware there might be limitations based on your VCD version.

The vApp can be moved across Org VDCs/Provider VDCs/clusters, vCenters of the same tenant but it will not work across associated Orgs for example. It also cannot be used for moving vApps across clusters/resource pools in the same Org VDC (for that use Migrate VM UI/API). Obviously the underlying vSphere platform must support vMotions across the involved clusters or vCenters. NSX backing (V to T) change is also supported.

The API method is using the target Org VDC endpoint with quite elaborate payload that must describe which vApp is being moved, how will the target network configuration look like (obviously parent Org VDC networks will change) and what storage, compute or placement policies will be used by every vApp VM at the target.

Note that if a VM is connected to a media (ISO) it must be accessible to the target Org VDC (the ISO is not migrated).

An example is worth 1000 words:

POST https://{{host}}/api/vdc/5b2abda9-aa2e-4745-a33b-b4b8fa1dc5f4/action/moveVApp

Content-Type:application/vnd.vmware.vcloud.MoveVAppParams+xml
Accept:application/*+xml;version=36.2

<?xml version="1.0"?>
<MoveVAppParams xmlns="http://www.vmware.com/vcloud/v1.5" xmlns:ns7="http://schemas.dmtf.org/ovf/envelope/1" xmlns:ns8="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSettingData" xmlns:ns9="http://www.vmware.com/schema/ovf">
  <Source href="https://vcd-01a.corp.local/api/vApp/vapp-96d3a015-4a08-4c59-93fa-384b41d4e453"/>
  <NetworkConfigSection>
    <ns7:Info>The configuration parameters for logical networks</ns7:Info>
       <NetworkConfig networkName="vApp-192.168.40.0">
            <Configuration>
                <IpScopes>
                    <IpScope>
                        <IsInherited>false</IsInherited>
                        <Gateway>192.168.40.1</Gateway>
                        <Netmask>255.255.255.0</Netmask>
                        <SubnetPrefixLength>24</SubnetPrefixLength>
                        <IsEnabled>true</IsEnabled>
                        <IpRanges>
                            <IpRange>
<StartAddress>192.168.40.2</StartAddress>
<EndAddress>192.168.40.99</EndAddress>
                            </IpRange>
                        </IpRanges>
                    </IpScope>
                </IpScopes>
                <ParentNetwork href="https://vcd-01a.corp.local/api/admin/network/1b8a200b-7ee7-47d5-81a1-a0dcb3161452" id="1b8a200b-7ee7-47d5-81a1-a0dcb3161452" name="Isol_192.168.33.0-v2t"/>
                <FenceMode>natRouted</FenceMode>
                <RetainNetInfoAcrossDeployments>false</RetainNetInfoAcrossDeployments>
                <Features>
                    <FirewallService>
                        <IsEnabled>true</IsEnabled>
                        <DefaultAction>drop</DefaultAction>
                        <LogDefaultAction>false</LogDefaultAction>
                        <FirewallRule>
                            <IsEnabled>true</IsEnabled>
                            <Description>ssh-VM6</Description>
                            <Policy>allow</Policy>
                            <Protocols>
<Tcp>true</Tcp>
                            </Protocols>
                            <DestinationPortRange>22</DestinationPortRange>
                            <DestinationVm>
<VAppScopedVmId>88445b8a-a9c4-43d5-bfd8-3630994a0a88</VAppScopedVmId>
<VmNicId>0</VmNicId>
<IpType>assigned</IpType>
                            </DestinationVm>
                            <SourcePortRange>Any</SourcePortRange>
                            <SourceIp>Any</SourceIp>
                            <EnableLogging>false</EnableLogging>
                        </FirewallRule>
                        <FirewallRule>
                            <IsEnabled>true</IsEnabled>
                            <Description>ssh-VM5</Description>
                            <Policy>allow</Policy>
                            <Protocols>
<Tcp>true</Tcp>
                            </Protocols>
                            <DestinationPortRange>22</DestinationPortRange>
                            <DestinationVm>
<VAppScopedVmId>e61491e5-56c4-48bd-809a-db16b9619d63</VAppScopedVmId>
<VmNicId>0</VmNicId>
<IpType>assigned</IpType>
                            </DestinationVm>
                            <SourcePortRange>Any</SourcePortRange>
                            <SourceIp>Any</SourceIp>
                            <EnableLogging>false</EnableLogging>
                        </FirewallRule>
                        <FirewallRule>
                            <IsEnabled>true</IsEnabled>
                            <Description>Allow all outgoing traffic</Description>
                            <Policy>allow</Policy>
                            <Protocols>
<Any>true</Any>
                            </Protocols>
                            <DestinationPortRange>Any</DestinationPortRange>
                            <DestinationIp>external</DestinationIp>
                            <SourcePortRange>Any</SourcePortRange>
                            <SourceIp>internal</SourceIp>
                            <EnableLogging>false</EnableLogging>
                        </FirewallRule>
                    </FirewallService>
                    <NatService>
                        <IsEnabled>true</IsEnabled>
                        <NatType>portForwarding</NatType>
                        <Policy>allowTraffic</Policy>
                        <NatRule>
                            <Id>65537</Id>
                            <VmRule>
<ExternalIpAddress>192.168.33.2</ExternalIpAddress>
<ExternalPort>2222</ExternalPort>
<VAppScopedVmId>e61491e5-56c4-48bd-809a-db16b9619d63</VAppScopedVmId>
<VmNicId>0</VmNicId>
<InternalPort>22</InternalPort>
<Protocol>TCP</Protocol>
                            </VmRule>
                        </NatRule>
                        <NatRule>
                            <Id>65538</Id>
                            <VmRule>
<ExternalIpAddress>192.168.33.2</ExternalIpAddress>
<ExternalPort>22</ExternalPort>
<VAppScopedVmId>88445b8a-a9c4-43d5-bfd8-3630994a0a88</VAppScopedVmId>
<VmNicId>0</VmNicId>
<InternalPort>22</InternalPort>
<Protocol>TCP</Protocol>
                            </VmRule>
                        </NatRule>
                    </NatService>
                </Features>
                <SyslogServerSettings/>
                <RouterInfo>
                    <ExternalIp>192.168.33.2</ExternalIp>
                </RouterInfo>
                <GuestVlanAllowed>false</GuestVlanAllowed>
                <DualStackNetwork>false</DualStackNetwork>
            </Configuration>
            <IsDeployed>true</IsDeployed>
        </NetworkConfig>
  </NetworkConfigSection>
  <SourcedItem>
    <Source href="https://vcd-01a.corp.local/api/vApp/vm-fa47982a-120a-421a-a321-62e764e10b80"/>
    <InstantiationParams>
      <NetworkConnectionSection>
        <ns7:Info>Network Connection Section</ns7:Info>
        <PrimaryNetworkConnectionIndex>0</PrimaryNetworkConnectionIndex>
                <NetworkConnection network="vApp-192.168.40.0" needsCustomization="false">
                    <NetworkConnectionIndex>0</NetworkConnectionIndex>
                    <IpAddress>192.168.40.2</IpAddress>
                    <IpType>IPV4</IpType>
                    <ExternalIpAddress>192.168.33.3</ExternalIpAddress>
                    <IsConnected>true</IsConnected>
                    <MACAddress>00:50:56:28:00:30</MACAddress>
                    <IpAddressAllocationMode>POOL</IpAddressAllocationMode>
                    <SecondaryIpAddressAllocationMode>NONE</SecondaryIpAddressAllocationMode>
                    <NetworkAdapterType>VMXNET3</NetworkAdapterType>
                </NetworkConnection>
      </NetworkConnectionSection>
    </InstantiationParams>
    <StorageProfile href="https://vcd-01a.corp.local/api/vdcStorageProfile/bdf68bda-8ab9-4ec1-970a-fafc34cdcf5b"/>
  </SourcedItem>
    <SourcedItem>
    <Source href="https://vcd-01a.corp.local/api/vApp/vm-a1f87b29-60e7-45ee-86e2-5b749a81ed19"/>
    <InstantiationParams>
      <NetworkConnectionSection>
        <ns7:Info>Network Connection Section</ns7:Info>
        <PrimaryNetworkConnectionIndex>0</PrimaryNetworkConnectionIndex>
                <NetworkConnection network="vApp-192.168.40.0" needsCustomization="false">
                    <NetworkConnectionIndex>0</NetworkConnectionIndex>
                    <IpAddress>192.168.40.3</IpAddress>
                    <IpType>IPV4</IpType>
                    <ExternalIpAddress>192.168.33.2</ExternalIpAddress>
                    <IsConnected>true</IsConnected>
                    <MACAddress>00:50:56:28:00:37</MACAddress>
                    <IpAddressAllocationMode>POOL</IpAddressAllocationMode>
                    <SecondaryIpAddressAllocationMode>NONE</SecondaryIpAddressAllocationMode>
                    <NetworkAdapterType>VMXNET3</NetworkAdapterType>
                </NetworkConnection>
      </NetworkConnectionSection>
    </InstantiationParams>
    <StorageProfile href="https://vcd-01a.corp.local/api/vdcStorageProfile/bdf68bda-8ab9-4ec1-970a-fafc34cdcf5b"/>
  </SourcedItem>
</MoveVAppParams>

In our case this is routed two VM vApp where both VMs are connected to the same routed vApp network named vApp-192.168.40.0 with set of port forwarding NAT rules and FW policies configured on the vApp router.

  • As said above it is a POST call against the target Org VDC – in our case 5b2abda9-aa2e-4745-a33b-b4b8fa1dc5f4.
  • The payload starts with the source vApp (vapp-96d3a015-4a08-4c59-93fa-384b41d4e453).
  • The follows the NetworkConfig section. Here we are describing the target vApp network topology. In general that section should be identical to the source vApp payload with the only difference being the ParentNetwork must refer to an Org VDC network from the target Org VDC. So in our case we are describing the subnet and IP pools of the vApp network (vApp-192.168.40.0), its new parent Org VDC network (Isol_192.168.33.0-v2t) and the way these two are connected (bridged or natRouted). As we are using routed vApp it is natRouted in our case. Then follow (optional) routed vApp features such as firewall policies or NAT rules. They should be pretty self explanatory and again they are usually identical to the source vApp section from the NetworkConfig. Note that VM object rules use VAppScopedVmId that is random looking UUID that changes every time the vApp is moved.
    We should highlight that IP addresses allocated to the vApp (its VMs or vApp routers) from the source Org VDC network are retained during the migration (and must be available in the target Org VDC network static IP pool).
  • After the NetworkConfigSection follow details of every vApp VM (SourcedItem) – to which vApp network(s) defined above the VM network interface(s) will connect (with which IP/MAC and IPAM mode) and which storage, placement and compute policies (StorageProfile, VdcComputePolicy and ComputePolicy) it should use. For the NIC section you usually take the source VM equivalent info. The vApp network name must be the one defined in the NetworkConfig section. For the policies you must obviously use target Org VDC policies as these will change.
  • BTW storage policy can be also defined at the disk level with DiskSetting element (the followin excerpt shows when named disk is connected)
            <DiskSettings>
                <DiskId>2016</DiskId>
                <SizeMb>8</SizeMb>
                <UnitNumber>0</UnitNumber>
                <BusNumber>1</BusNumber>
                <AdapterType>3</AdapterType>
                <ThinProvisioned>true</ThinProvisioned>
                <Disk href="https://vcd-01a.corp.local/api/disk/567bdd04-4905-4a62-95e7-9f4850f85240" id="urn:vcloud:disk:567bdd04-4905-4a62-95e7-9f4850f85240" type="application/vnd.vmware.vcloud.disk+xml" name="Disk1"/>
                <StorageProfile href="https://vcd-01a.corp.local/api/vdcStorageProfile/1f8bf2df-d28c-4bec-900c-726f20507b5b"/>
                <overrideVmDefault>true</overrideVmDefault>
                <iops>0</iops>
                <VirtualQuantityUnit>byte</VirtualQuantityUnit>
                <resizable>true</resizable>
                <encrypted>false</encrypted>
                <shareable>false</shareable>
                <sharingType>None</sharingType>
            </DiskSettings>

The actual vApp migration triggers async operation that takes some time to complete. If you observe what is happening in VCD and vCenter you will see that a new temporary “-generated” vApp is created in the target Org VDC with the VMs being first migrated there. In case of routed vApps the vApp routers (edge service gateways or Tier-1 gateways) must be deployed as well. When all the vApp VMs are moved the source vApp is removed and the target vApp with the same identity is created and the VMs from generated vApp are relocated there. If all goes as expected the generated vApp is removed.

Shout-out to Julian – the engineering brain behind this feature.

Console Proxy Traffic Enhancements

$
0
0

VMware Cloud Director provides direct access to tenant’s VM consoles via proxying the vSphere console traffic from ESXi hosts running the workload, through VCD cells, load balancer to the end-user browser or console client. This is fairly complex process that requires dedicated TCP port (by default 8443), certificate and a load balancer configuriation without SSL termination (SSL pass-through).

Especially the dedicated certificate requirement is annoying as any change to this certificate cannot be done at the load balancer level, but must be performed on every cell in the VCD server group and those need to be restarted.

However, VMware Cloud Director 10.3.3 for the first time showcases newly improved console proxy. It is still an experimental feature and therefore not enabled by default, but can be accessed in the Feature Flags section of the provider Administration.

By enabling it, you switch to the enhanced console proxy implementation that gives you the following benefits:
  • Console proxy traffic is now going over the default HTTPS 443 port together with UI/API. That means no need for dedicated port/IP/certificate.
  • This traffic can be SSL terminated at the load balancer. This means no need for specific load balancing configuration that needed the SSL pass through of port 8443.
  • The Public Addresses Console Proxy section is irrelevant and not used

The followin diagram shows the high level implementation (credit and shout-out goes to Francois Misiak – the brain behind the new functionality).

As this feature has not yet been tested at scale it is marked as experimental but it is expected that this will be the default console proxy mechanism starting in the next major VMware Cloud Director release. Note that you will still be able to revert to the legacy one if needed.

Control System Admin Access to VMware Cloud Director

$
0
0

When VMware Cloud Director is deployed in public environment setup it is a good practice to restrict the system admin access only for specific networks so no brute force attack can be triggered against the publicly available UI/API end points.

There is actually a relatively easy way to achieve this via any web application firewall (WAF) with URI access filter. The strategy is to protect only the provider authentication end points which is much easier than to try to distinguish between provider and tenant URIs.

As the access (attack) can be done either through UI or API the solution should address both. Let us first talk about the UI. The tenants and provider use specific URL to access their system/org context but we do not really need to care about this at all. The UI is actually using (public) APIs so there is nothing needed to harden the UI specifically if we harder the API endpoint. Well, the OAuth and SAML logins are exception so let me tackle them separately.

So how can you authenticate to VCD via API?

Integrated Authentication

The integrated basic authentication consisting of login/password is used for VCD local accounts and LDAP accounts. The system admin (provider context) uses /cloudapi/1.0.0/sessions/provider API endpoint while the tenants use /cloudapi/1.0.0/sessions.

The legacy (common for both providers and tenant) API endpoint /api/sessions has been deprecated since API version 33.0 (introduced in VCD 10.0). Note that deprecated does not mean removed and it is still available even with API version 36.x so you can expect to be around for some time as VCD keeps backward compatible APIs for few years.

You might notice that there is in a Feature Flags section the possibility to enable “Legacy Login Removal”.

Feature Flags

Enabling this feature will disable legacy login both for tenants and providers however only if you use alpha API version (in the case of VCD 10.3.3.1 it is 37.0.0-alpha-1652216327). So this is really only useful for testing your own tooling where you can force the usage of that particular API version. The UI and any 3rd party tooling will still use the main (supported) API versions where the legacy endpoint will still work.

However, you can forcefully disable it for provider context for any API version with the following CMT command (run from any cell, no need to restart the services):

/opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n vcloud.api.legacy.nonprovideronly -v true

The providers will need to use only the new cloudapi/1.0.0/providers/session endpoint. So be careful as it might break some legacy tools!

API Access Token Authentication

This is a fairly new method of authentication to VCD (introduced in version 10.3.1) that uses once generated secret token for API authentication. It is mainly used by automation or orchestration tools. The actual method of generating session token requires access to the tenant or provider oauth API endpoints:

/oauth/tenant/<tenant_name>/token

/oauth/provider/token

This makes it easy to disable provider context via URI filter.

SAML/OAuth Authentication via UI

Here we must distinguish the API and UI behavior. For SAML, the UI is using /login/org/<org-name>/… endpoint. The provider context is using the default SYSTEM org as the org name. So we must filter URI starting with /login/org/SYSTEM.

For OAuth the UI is using the same endpoint as API access token authentication /oauth/tenant vs /oauth/provider. /login/oauth?service=provider

For API SAML/OAuth logins cloudapi/1.0.0/sessions vs cloudapi/1.0.0/sessions/provider endpoints are used.

WAF Filtering Example

Here is an example how to set up URI filtering with VMware NSX Advanced Load Balancer.

  1. We need to obviously set up VCD cell (SSL) pool and Virtual Service for the external IP and port 443 (SSL).
  2. The virtual service application profile must be set to System-Secure-HTTP as we need to terminate SSL sessions on the load balancer in order to inspect the URI. That means the public SSL certificate must be uploaded to load balancer as well. The cells can actually use self signed certs especially if you use the new console proxy that does not require SSL pass through and works on port 443.
  3. In the virtual service go to Policies > HTTP Request and create following rules:
    Rule Name: Provider Access
    Client IP Address: Is Not: <admin subnets>
    Path: Criteria – Begins with:
    /cloudapi/1.0.0/sessions/provider
    /oauth/provider
    /login/oauth?service=provider
    /login/org/SYSTEM
    Content Switch: Local response – Status Code: 403.
WAF Access Rule

And this is what you can observe when trying to log in via integrated authentication from non-authorized subnets:

And here is an example of SAML login:

Viewing all 242 articles
Browse latest View live