vCloud Director – vCenter Server Relationship

October 16, 2018, 9:30 am

≫ Next: Postman and vCloud Director 9.5 Access Token Authentication

≪ Previous: How to Change vCloud Director Installation ID

vCloud Director as a cloud management platform needs resources to provision the target workloads to. These resources are provided by vCenter Server (compute, storage, networks) and NSX Manager (networks and networking services).

In the past vCloud Director required tight grip on those resources, so the recommendation and best practice was to dedicate them to the particular vCloud Director instance. System admins were discouraged to run additional workloads not managed by vCloud Director on them. However that has changed recently, therefore the need for this blog article.

vCenter Server Extension

vCloud Director used to register itself as a vCenter Server extension. That allowed to ‘protect’ VCD managed VMs with a special icon and a warning pop up during vCenter edits of such VMs.

vCloud Director specific VM icon

Today, vCloud Director is quite resilient against changes on a particular VM done directly in vCenter Server, so there is no more need for those warnings. vCloud Director 9.5 thus no longer register itself as vCenter Server extension, so you will no longer see these icons and pop up warnings.

As a side note you will also see a change in the VM naming. The long UUID is no longer added to the VM name and is replaced by shorter 4 digit random characters.

Host Preparation

In the past during creation of a Provider VDC, the system admin was asked for ESXi host credentials. This was needed to upload cloud agent vib that was used for certain features (thumbnails, VCDNI network encapsulation). All these features were either replaced by different mechanism or deprecated, so there is no need to upload any vcloud vibs to ESXi hosts anymore.

Additionally, custom attribute system.service… used to be set for each vCloud Director managed host and vCloud Director managed VM. This provided a way to control where vCenter DRS could vMotion VMs through this host to VM compatibility option. Disabling a host would remove the custom property. vCloud Director VM could not be vMotioned to unprepared host as vCenter would scream the host is incompatible with the VM.

In vCloud Director 9.5 this mechanism was completely eliminated. When host is put into maintenance mode, it is considered unavailable for vCloud Director, therefore there is no need anymore to disable it first in vCloud Director. You will no longer see any host preparation dialog and the hosts section is simplified to bare minimum.

So What About the Relationship?

As you can see the vCloud Director – vCenter Server relationship is very loose. In fact it is no longer monogamous, meaning vCenter Server can be married (associated) with multiple vCloud Director instances at the same time.

Why would you do that?

I can think of three use cases, but obviously our smart service providers will come up with more.

Use case 1: Test & Dev

Do you need to test new vCloud Director release? Or provide test instance of vCloud Director for your internal developers? No need to spin up whole vSphere + NSX environment with storage, etc. Just deploy one VM with vCloud Director bits (you can even use the appliance if you have external DB and NFS ready) and point it to the existing vSphere/NSX endpoints.

Use case 2: Whitelabeling / Reselling

To enable three tier mode, where SP provides infrastructure, reseller will get its own vCloud Director instance (branded) to resell to their end customers. SP needs to setup one big vSphere/NSX infrastructure and have an automated way to deploy VCD instances on top of it. The reseller gets its own instance with system admin equivalent rights and manages its own tenants.

Use case 3: Uber Org Admin Role

Some end-customers request bigger than Org Admin role. They want to create their own organizations and Org VDCs to better align with their business groups. SP can dedicate whole VCD instance to such customer. Without the need of provisioning dedicated vSphere/NSX as well.

Any Caveats, Recommendations?

Segment vSphere environment with Clusters and Resource Pools for each VCD instance.
Use different VCD instance names and IDs
Use separate accounts both for vCenter Server and NSX for each VCD instance. Give each account permission only to resources it should see (use vCenter No-Access privilege on clusters/RP/folders it should not see)
Dedicate storage resources to each VCD
Use separate NSX transport zone for each VCD instance
Monitor the load of multiple VCD listeners on the single VC. Scale out VCs if needed. VMware does not test this kind of setup at scale.

Fun Fact

You can in fact vMotion running VM from one vCloud Director environment to another one. To do so, you will move it in vCenter from the source Org VDC RP to the destination Org VDC RP. You must also move it out of the VM Folder (remember the No Access privilege?) to be visible to the target VCD. Obviously it needs to be connected to the right target networks.

Finally, you will need to remove the original vCloud UUID (with PowerCLI or similar) and let it be auto-discovered by the target VCD. There is no auto-removal from the original VCD, so you will need to use the process described here.

↧

Postman and vCloud Director 9.5 Access Token Authentication

October 18, 2018, 8:20 am

≫ Next: vCloud Director 9.5 and VMware Identity Manager Integration

≪ Previous: vCloud Director – vCenter Server Relationship

Quick post on how to configure Postman to use the new vCloud API 31.0 bearer token authentication instead of the deprecated authorization token header.

1. Create your environment if you have not done yet so by clicking the gear icon in the top right corner. Specify environment name and host variable with FQDN to the vCloud Director instance.
2. Select the environment in the pull down selection box next to the gear icon.
3. Create new POST request with URL https://{{host}}/api/sessions
  In Headers section add Accept header: application/*+xml;version=31.0
4. Go to the Tests section and add the following code snippet:
```
var bearer = postman.getResponseHeader("X-VMWARE-VCLOUD-ACCESS-TOKEN")
pm.environment.set("X-VMWARE-VCLOUD-ACCESS-TOKEN",bearer)
```
5. In the Authorization section, select Basic Auth type and provide username (including @org) and password.
6. Click Send. You should see Status: 200 OK and the response Headers and Body. Save the request into existing or new collection.
  
  If you did not get 200 OK, fix the error (credentials, or typo).
7. Notice that in the Headers section of the response is provided the X-VMWARE-VCLOUD-ACCESS-TOKEN. We will not use it for subsequent API calls. It has been picked up and saved into environment variable by the code provided in step #4.
8. Create new API call. For example: GET https://{{host}}/api/org. Keep the same Accept header. Go to Authorization tab and change the type to Bearer Token and in the token field provide {{X-VMWARE-VCLOUD-ACCESS-TOKEN}}
9. Click Send. You should get response Status: 200 OK and a list of all Organizations the user is authorized in. Save the new call into collection as Get Organizations.
Create additional calls into your collection as needed by repeating steps #8-9. You can now reuse your collection anytime also on different environments. Log in first with the POST Login call while specifying correct credentials and then run any other calls from the collection.

↧

vCloud Director 9.5 and VMware Identity Manager Integration

October 30, 2018, 3:47 am

≫ Next: vCloud OpenAPI – Large Payload Issue with Load Balancer

≪ Previous: Postman and vCloud Director 9.5 Access Token Authentication

About six months ago I blogged about VMware Identity Manager (VIDM) federation with vCloud Director. That article is still fully valid (and start there if you have not read it yet), however with the introduction of the new tenant HTML 5 user interface I want to describe how you can now chose which UI (legacy or new HTML5) the user will be redirected to.

When a vCloud Director organization is federated with an external IdP there are two different workflows for the login process:

In the first workflow the user goes to vCloud Director URL and is redirected to the external IdP to authenticate. After the authentication the user is redirected back to vCloud Director. Now depending on which URL the user initially used, she will be redirected to legacy UI (https://vcloud.example.com/cloud/org/coke) or HTML 5 UI (https://vcloud.example.com/tenant/coke).
In the second workflow, the user authenticates to the external IdP first and then is presented with catalog of federated apps and accessible through Single SignOn. Below is an example of VMware Workspace One catalog.

Clicking a tile with of an app will redirect and sign-in the user directly to the particular app.

The VIDM integration as described in the previous post will however always redirect the user to the legacy UI. So how to force the usage of the new HTML 5 UI?

The is done by adding the Relay State URL to the config of the Web App in VIDM. The tricky part is that (at least as of version 9.5) vCloud Director expects the parameter to be Base64 encoded.

So in my example, the HTML 5 URL for the particular organization I want the user to be redirected to is: https://vcloud.fojta.com/tenant/coke which is Base64 encoded to: aHR0cHM6Ly92Y2xvdWQuZm9qdGEuY29tL3RlbmFudC9jb2tl and that is what must be entered in the Relay State URL field.

I can now create two Web App tiles for the user, so she can choose to which UI to go.

↧

vCloud OpenAPI – Large Payload Issue with Load Balancer

November 2, 2018, 7:24 am

≫ Next: vCloud Director 9.5 Appliance Tips

≪ Previous: vCloud Director 9.5 and VMware Identity Manager Integration

With vCloud Director version 9 new API (cloudapi) based on OpenAPI specification has been introduced next to the legacy based XML API. In vCloud Director 9.5 API Explorer enables consumption of the API directly from the vCloud UI endpoint (read here). Most of the new features are using this OpenAPI such as H5 UI branding, extensions, vRealize Orchestrator service integrations, Cross VDC networking and Roles management.

OpenAPI is very simple to use, JSON based with links provided in headers. However there might be some issues when load balancer with SSL termination is involved as due to the header or payload size the request response will not get through the load balancer.

One such issue is documented in the vCloud Director 9.5 release notes. Attempting to edit Global Rules in the new H5 UI will fail with an error:

unexpected character at line 1 column 1 of the JSON data.

In my case I am using NSX Edge Load balancer with SSL termination and below is the error screenshot:

There are multiple workarounds described in the release notes but actually none worked for me:

increasing header maximum at the Edge LB as described in KB 52553 did not help as the number of headers is not the only issue in the particular scenario – the body payload size is as well
limiting maximum page size in vCloud Director with cell-management-tool manage-config -n restapi.queryservice.maxPageSize -v 25 fixes the above API call but the subsequent call made by the UI ignores the setting and the response will not get through the LB again.

After some investigations and troubleshooting I discovered that there is a way to increase Edge LB buffer size above the default 32 KB with similar call to the one in the KB 52553:

PUT https://<NSX-Manager>/api/4.0/edges/<Edge-ID>/systemcontrol/config

<systemControl>
    <property>lb.global.tune.http.maxhdr=1024</property>
    <property>lb.global.tune.bufsize=65536</property>
</systemControl>

The above call (NSX 6.4) was enough to fix the issue for me and i can now edit Global Roles in the UI:

↧

vCloud Director 9.5 Appliance Tips

December 3, 2018, 2:42 am

≫ Next: vSphere Replication 8.1.1: vcta Issue

≪ Previous: vCloud OpenAPI – Large Payload Issue with Load Balancer

With vCloud Director 9.5 VMware for the first time released vCloud Director in fully supported appliance format. It is the first iteration of longer process to provide the whole solution in the appliance format, therefore external NFS, database (PostgreSQL/MS SQL) and RabbitMQ is still needed, but this will change in future releases. I would therefore advise today using the 9.5 version only for green field environments and not to mix it with RHEL/CentOS based vCloud Director setups.

If you are going to deploy the appliance here are some tips:

Use vSphere Web Client (FLEX) or OVFTool to deploy the appliance. The HTML5 client is not supported.
OVF Appliance networking (DNS/Gateway) is provided through Network Profile for the particular port group the appliance is going to be connected to. If it does not exist, vSphere Web Client will create it the first time you deploy appliance to the port group.
Appliance is deployed only with one vNIC and one IP address. That means NFS and DB must be accessible from the vNIC (directly or via routed connection). API/UI and Console Proxy are sharing the same IP, but Console Proxy uses port 8443. So you must adjust your Console Proxy Load Balancer network pool to this port.
Appliance uses vcloud user with ID 1002 which most likely is different from RHEL/CentOS vcloud user ID and will cause NFS permission issues. That’s why I do not recommend mixed setup.
Appliance will copy responses.properties file to the NFS share for other cells to use and connect to the database. Note that the file contains encrypted database login credentials but also the encryption key, so make sure access to NFS share is controlled.
If you need to change appliance network configuration after the fact, use the following command: /opt/vmware/share/vami/vami_config_net. The appliance currently has no admin UI.
Appliance is Photon based, so you can install additional packages with tdnf install command.

↧

vSphere Replication 8.1.1: vcta Issue

January 16, 2019, 5:27 am

≫ Next: IPv6 Support Overview in vCloud Director 9.5

≪ Previous: vCloud Director 9.5 Appliance Tips

VMware has recently released vCloud Availability 2.0.1.1 update that adds vCloud Director 9.5.0.1 compatibility. The tenant needs to install vSphere Replication 8.1.1 which supports vSphere 6.7U1 all the way down to 6.0U3.

The on-prem upgrade from older vSphere Replication appliance (e.g. 6.5.1) is side-by-side. Meaning; you deploy the new 8.1.1 appliance and it connects to the existing one to migrate the data over.

I have noticed that with the new 8.1.1 appliance my cloud replications were not active.

The reason for that was the vcta service was not running on the appliance. The service is responsible for establishing the tunnel with the cloud endpoint and transferring replicated data. Note that the service is not needed for regular vSphere to vSphere replications.

In lab environment where you need to apply a custom endpoint certificate as described here, you might not notice this issue immediately, as the service is started after the certificate change manually with service vcta restart. However, after appliance reboot the service will be down again.

The fix is easy, just enable the service with:

systemctl enable vcta

command from the appliance CLI (via console or SSH if you enabled it before). This is one more thing to remember when setting up cloud replications next to the ESXi vr2c-firewall.vib issue I documented here.

↧

IPv6 Support Overview in vCloud Director 9.5

January 29, 2019, 6:56 am

≫ Next: Resource Consumption of Org VDC Allocation Types

≪ Previous: vSphere Replication 8.1.1: vcta Issue

vCloud Director version 9.5 is the first release to provide networking IPv6 support. In this article I want to go into little bit more detail on the level of IPv6 functionality than was in my What’s New in vCloud Director 9.5 post.

IPv6 functionality is mostly driven by the underlying networking platform support which is provided by NSX. The level of IPv6 support in NSX-V is changing from release to release (for example NAT64 feature was introduced in NSX version 6.4). Therefore my feature list assumes the latest NSX 6.4.4 is used.

Additionally it should be noted that vCloud Director 9.5 also supports in very limited way NSX-T. Currently no Layer 3 functionality is supported for NSX-T based Org VDC networks which are imported based on pre-existing logical switches as isolated networks with IPv4 only subnets.

Here is the feature list (vCloud Director 9.5.0.1 and NSX 6.4.4).

Supported:

Create External network with IPv6 subnet (provider only). Note: mixing of IPv4 and IPv6 subnets is supported.
Create Org VDC network with IPv6 subnet (direct or routed). Note: distributed Org VDC networks are not supported with IPv6
Use vCloud Director IPAM (static/manual IPv6 assignments via guest customization)
IPv6 (static only) routing via Org VDC Edge Gateway
IPv6 firewall rules on Org VDC Edge Gateway or Org VDC Distributed Firewall via IP Sets
NAT 64 (IPv6-to-IPv4) on Org VDC Edge Gateway
Load balancing on Org VDC Edge Gateway: IPv6 VIP and/or IPv6 pool members

Unsupported:

DHCP6, SLAAC (RA)
Routed vApp networks with IPv6 subnets
Isolated Org VDC/vApp networks with IPv6 subnets
OSPF v3, IPv6 BGP dynamic routing on Org VDC Edge Gateway
Distributed IPv6 Org VDC networks
Dual stacking IPv4/IPv6 on OrgVDC networks

Experimental/Untested:

L2 VPN (tunnel only)
SSL VPN (tunnel only)
IPSec VPN (tunnel + inner subnets)

↧

Resource Consumption of Org VDC Allocation Types

February 1, 2019, 4:05 am

≫ Next: Upgrade PostgreSQL version 9 to 10

≪ Previous: IPv6 Support Overview in vCloud Director 9.5

The reference table below summarizes how different vCloud Director Org VDC allocation types consume vSphere resources. In other words: how a choice of allocation model for a particular Org VDC and its parameters (allocation, guarantees, quota, vCPU speed) translate to resource pool and VM resource settings (CPU/RAM) – reservations and limits.

Notes:

valid for vCloud Director 9.5 and older (down to 5.5)
RP … resource pool
Elastic … Org VDC can be divided across multiple RPs across clusters
Although there are currently only three Org VDC allocation types the Allocation Pool can be elastic or non-elastic based on vCloud Director instance wide setting in General Settings

↧

Upgrade PostgreSQL version 9 to 10

February 11, 2019, 10:03 am

≫ Next: PostgreSQL: Beware of the bloat!

≪ Previous: Resource Consumption of Org VDC Allocation Types

I had to perform multiple PostgreSQL database upgrades from version 9 to version 10. The database was used for vCloud Director but I believe it is generic enough for other purposes.

The base operating systems I am using is CentOS 7.

Here follows the step-by-step procedure:

Create database backup:
su – postgres
pg_dumpall > /tmp/pg9backup
exit
Shutdown and uninstall old PostgreSQL v9:
systemctl stop postgresql-9.5.service
yum remove postgresql*
Archive old datafiles (you will need them later):
mv /var/lib/pgsql/data/ /data.old
Install new PostgreSQL v10:
yum -y install https://download.postgresql.org/pub/repos/yum/10/redhat/rhel-7-x86_64/pgdg-centos10-10-2.noarch.rpm
yum -y install postgresql10-server
systemctl enable postgresql-10
Initiate and start DB:
su – postgres
/usr/pgsql-10/bin/initdb
cp /data.old/pg_hba.conf /var/lib/pgsql/10/data/
cp /data.old/postgresql.conf /var/lib/pgsql/10/data/
exit
systemctl start postgresql-10
Restore backup
su – postgres
psql -d postgres -f /tmp/pg9backup
Reboot server. If everything works, you can delete your pg9backup and data.old archive.

↧

PostgreSQL: Beware of the bloat!

March 5, 2019, 6:11 am

≫ Next: NSX-T 2.4: Force Local Account Login

≪ Previous: Upgrade PostgreSQL version 9 to 10

During the last weekend my vCloud Director lab died. And the reason was PostgreSQL DB filled up all the disk space. How could that happen in my small lab with one running vApp?

PostgreSQL database when updating rows actually creates new ones and does not immediately delete the (now dead) old rows. That is done in a separate process called vacuuming.

vCloud Director has one pretty busy table named activity_parameters that is continuously updated. And as you can see from the below screenshot (as reported by pgAdmin table statistics) the table size is 26 MB but it is actually taking 24 GB of hard disk space due to the dead rows.

Another quick way to check DB size via psql CLI is:

\c vcloud
SELECT pg_size_pretty (pg_total_relation_size(‘activity_parameters’));

Vacuuming takes times and therefore it can be tuned in postgresql.conf via a few parameters which VMware documents specifically for vCloud Director here or here. Make sure you apply them (I did not). Another issue that could prevent vacuuming to happen is a stale long running transaction on the table.

The fix:

short term: add more disk space
long term: make sure postgresql.conf is properly configured
autovacuum = on
track_counts = on
autovacuum_max_workers = 3
autovacuum_naptime = 1min
autovacuum_vacuum_cost_limit = 2400
manually vacuum the activity_parameters table with the following psql CLI command:
VACUUM VERBOSE ANALYSE activity_parameters;

And do not forget to monitor free disk space on your PostgreSQL host.

↧

NSX-T 2.4: Force Local Account Login

March 7, 2019, 7:11 am

≫ Next: vCenter Server Issue: Recent Tasks Show xxx.label

≪ Previous: PostgreSQL: Beware of the bloat!

NSX-T supports Role Based Access Control by integrating with VMware Identity Manager which provides access to 3rd party Identity Sources such as LDAP, AD, SAML2, etc.

When NSX-T version 2.3 is integrated with VIDM you would get a choice during the login which type of account you are going to provide (remote or local).

NSX-T version 2.4 no longer provides the option and will always default to the SAML source (VIDM). To force the login with local account provide this specific URL:

https://<NSX-T_FQDN/IP>/login.jsp?local=true

↧

vCenter Server Issue: Recent Tasks Show xxx.label

March 11, 2019, 7:01 am

≫ Next: vCloud Director Federation with IBM Cloud Identity

≪ Previous: NSX-T 2.4: Force Local Account Login

I had an annoying issue in my lab. Some time ago when I performed vSphere 6.7 PSC convergence my vCenter would stop displaying proper names of tasks in the vSphere Clients UI (both Flex and H5) and show only their placeholders with names like xxx.label.

While there are some KB or communities articles about the issue (and fix) none of them was applicable to my situation (running vCenter Server 6.7U1). I thought that VCSA patches or even deploying new appliance with backup restore would fix it but it did not.

After a little research I found out that the issue is caused by missing catalog.zip file in the /etc/vmware-vpx/locale/ folder. I had another lab with the exactly same vCenter Server build deployed so I just copied that file and transferred it to my vCenter Server Appliance. After service restart via VAMI UI tasks names were back.

I do not know the root cause, but if you have the same issue, give it a go.

↧

vCloud Director Federation with IBM Cloud Identity

March 22, 2019, 4:49 am

≫ Next: What’s New in vCloud Director 9.7

≪ Previous: vCenter Server Issue: Recent Tasks Show xxx.label

IBM Cloud Identity is a cloud SaaS Single Sign-On solution supporting multifactor authentication and identity governance. In this article I will describe how to integrate it with vCloud Director, where vCloud Director acts as a service provider and IBM Cloud Identity as an identity provider.

I have already wrote numerous posts how to federate vCloud Director with Microsoft Active Director Federation Service, VMware Identity Manager and vCenter Single Sign-On. What makes the integration different for IBM Cloud Identity is that it does not accept vCloud Director metadata XML for simple service provider setup and thus the integration requires more steps.

IBM Cloud Identity is a SaaS service and can be for free set up in a few minutes. It is pretty straight forward and I will skip that part.

As usual, in vCloud Director as Organization Administrator we must prepare the organization for federation. It means making sure that in the in the Administration > Settings > Federation the Entity ID is not empty and up-to-date certificate is generated that will be used to trust and secure the SAML2 assertion exchange between IdP and vCloud Director. The vCloud Director autogenerated self-signed certificate has always 1 year validity, which means once a year it must be regenerated (and the IdP reconfigured). The Organization Administrator is alerted via email when the expiration date is approaching. With vCloud API it is possible to provide your own publicly trusted certificate (with possibly longer validity).

Now we can download the metadata XML from the link provided on the same screen. As mentioned above we unfortunately cannot just upload it to IBM Cloud Identity, instead we need to manually retrieve the correct information from the downloaded spring_saml_metadata.xml file.

We will need the federation certificate (<ds:X509Certificate>) saved as properly formated PEM file:
—–BEGIN CERTIFICATE—–
…
—–END CERTIFICATE—–

Assertion consumer service URL which is in my case: https://vcloud.fojta.com/cloud/org/ibm/saml/SSO/alias/vcd

and entityID – in my case IBM.

Now in IBM Cloud Identity we can set up the application:

Upload the vCloud Director federation certificate in Settings > Certificates > Add Signer Certificate:
Create new application in Applications > Add > Custom Application and set up General details like Description, icon and Application Owners.
Now in Sign-On submenu we can enter all details we have collected from vCloud Director:
– Sign-on Method: SAML2.0
– Provider ID: <EntityID>
– Assertion Consumer Service URL
– optionally check Use identity provider initiated single sign-on checkbox and provide Target URL in BASE64 encoded string (in my case I used H5 tenant endpoint URL: https://vcloud.fojta.com/tenant/ibm which base64 encoded translates to: aHR0cHM6Ly92Y2xvdWQuZm9qdGEuY29tL3RlbmFudC9pYm0
– Service Provider SSO URL (same as Assertion Consumer Service URL)
– check Sign authentication response and pick Signature Althorithm RSA_SHA256
– check Validate SAML request signature and pick the certificate from step #1
– optionally check Encrypt assertion
– Name Identifier: preferred_username
– NameID Format: urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress
Uncheck: Send all known user attributes in the SAML assertion and instead provide custom list of Attributes to be used. vCloud Director supports the following attributes:
UserName
EmailAddress
http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname
http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname
Groups
Role
You can configure the mapping to each. I used preferred_username to be used for vCloud Director username (but alternatively email could be used as well) and did not map Role attribute as I will manage roles in vCloud Director and not leverage Defer to Indentity Provider Role.
Configure Access Policies and Entitlements to specify which users/groups can use vCloud Director.
After saving the application configuration, we can retrieve its SAML2.0 federation metadata from the link provided on the right side of the Sign-On screen.
Back in vCloud Director > Administration > Settings > Federation check Use SAML Identity Provider and upload the downloaded metadata.xml file from the previous step.
Finally we need to import SAML users/groups and assign their role.

If everything done correctly you should be able to login both from IBM Cloud Identity and vCloud Director with the IdP user.

↧

What’s New in vCloud Director 9.7

March 29, 2019, 2:42 am

≫ Next: vCloud Director Appliance – Phone Home Configuration

≪ Previous: vCloud Director Federation with IBM Cloud Identity

With an impressive cadence after less than 6 months there is a new major release of vCloud Director – version 9.7. As usual, I will go into the new features from the technical perspective and provide additional links to related blog post in the future.

Just for reminder, you can find older What’s New in vCloud Director blog posts here: 9.5, 9.1.

Tenant User Interface Evolution

The legacy Flex UI is still here, but there is less and less reasons to use it. My guess is that 95% of the Flex features are ported to the HTML5 UI which also provides additional exclusive feature.

Branding: the service provider can now (with CloudAPI) change color scheme, theme, logos (including favicon and login page) and title not only globally but also individually for each tenant. The provider can also add structured custom links and override existing links (help, about and standalone VMware Remote Console download). The custom links can also be dynamic with inclusion of (self-explaining) custom variables: ${TENANT_NAME}, ${TENANT_ID} and ${SESSION_TOKEN}
New ribbon offers quick glance on the content of all Organizations that logged in user has access to.
Recent Tasks pane provides immediate info about what is going on (last 50 items in past 12 hours)
Global search provides quick way to find particular object across VDCs or even sites:
vApp Network Diagram now shows vApp logical networking
Besides these large changes there are many small enhancements that level up tenant UI to nearly fully cover the legacy Flex UI. Users are actively encouraged to start using the Tenant UI with the yellow banner on top legacy UI:

Service Provider Admin UI

Service Provider Admin HTML5 UI adds access to Cloud Resources, vSphere Resources, Blocking Tasks to continue the process of adding features available from Admin Flex UI. Some items are still read only. On the other hand some (new) features are only available in the H5 UI: for example adding NSX-T Manager, Flex allocation model, etc.

New Compute Features

Flexible allocation model

The service provider can create (H5 UI only) completely new type of Org VDC – Flexible Allocation Model. The new model actually covers all the legacy allocation models (see here) plus provider can define completely new way for VMs and VDC to consume vSphere compute resources.

It is also possible to change allocation model of existing Org VDCs. For now that feature is possible only for Org VDCs created in 9.7 release though.

Compute Policies

While compute policies were already introduced in the previous release, the feature functionality is now enhanced and additionally simplified by including the tenant part in the H5 UI.

They are used to control resource allocation for example for licensing, performance or availability use cases. Provider defines (via vCloud Open API) policies that the tenant can then assign to deployed VMs.

Provider can for example define Microsoft Windows licensed hosts, create appropriate policy and assign it to Windows templates. Any VM deployed from such template will be placed only on those hosts.

Similarly provider can define high performance compute policy, which results in higher VM’s CPU limit and reservation. Tenant can chose and apply such policy for subset of her workloads.

Tenant could also use this feature to select site deployment for each particular VM for Org VDCs backed by vSphere Metro Cluster.

New Networking Features

Edge Cluster

The service provider has now the ability to control placement of each Org VDC Edge Gateway node (both for compute and storage) in order to get better resiliency and to secure higher SLAs. This functionality is available currently only through vCloud Open API (look for networkProfile). The provider first creates Edge Cluster by specifying resource pool and storage policy pair for primary and secondary Edge Cluster. Then the Edge cluster is assigned to Org VDCs. The Org VDC Edge Gateway nodes are automatically deployed into the Edge Cluster resource pools/datastores.

Legacy Edge deprecation

Org VDC Edge Gateways can no longer be deployed in the legacy mode (without advanced networking enabled). The existing legacy edges must be upgraded to advanced otherwise they are not manageable by vCloud Director. This is usually non-disruptive operation (unless they are still on version 5.5). The upgrade can be performed in bulk (or per org) with cell-management-tool commands:

./cell-management-tool edge-ip-allocation-updates --host <vcd-host> --user administrator --status
./cell-management-tool edge-ip-allocation-updates --host <vcd-host> --user administrator --update-ip-allocations

NSX-T Support

There is no new NSX-T related functionality other than the ability to register NSX-T Manager via UI and support for NSX-T 2.4 (Policy APIs).

SDDC Proxy

This is a completely new feature that allows using vCloud Director as proxy to a dedicated SDDC (vCenter Server with optional NSX). Provider thus can offer multitenant shared services together with dedicated infrastructure with direct access to its management components. vCloud Director becomes Centralized Point of Management (CPOM).

This is quite powerful feature and probably requires its own blog post, but briefly here is how it works.

The service provider deploys dedicated SDDC and register its vCenter Server into vCloud Director that is not going to be used for any Provider VDCs. Then with vCloud OpenAPI creates SDDC object pointing to the vCenter Server and publishes it to an organization. This will create SDDC Proxy which similarly as console proxy securely proxies the tenant all the way to the dedicated vCenter Server (UI/API management endpoints) without the need to expose it to the internet. Additional proxies can be added if needed for additional endpoints (NSX-T Manager, Site Replication Appliance, etc.). The proxying is configured in the users browser by downloading proxy configuration (PAC file) and is protected with time limited access token.

The SDDC appears as another tile in the Datacenter UI.

In order for the tenant to see the new tile type, CPOM extension must be enabled by the provided in the UI plugins (which now can be done from the UI).

vCenter Server published as SDDC is shown in the Provider Admin UI as Tenant published, VC used for PVDC is shown as Service Provider published.

It is possible to override limits, certificate security and ability to use vCenter Server both as SDDC and for VCD Provider VDC with vCloud API /api/admin/extension/settings/cpom.

The SDDC feature also introduces 3 new rights (View SDDC, Manage SDDC and Manage SDDC Proxy).

Appliance

After the introduction of vCloud Director appliance in the 9.5 release, the new 9.7 appliance now provides not only cell functionality but also embedded PostgreSQL database which can be deployed in active – standby configuration, with synchronous physical replication and semi-automated failover provided by embedded replication manager and manually triggered through appliance ‘promote‘ UI. Usage of an external database with the appliance is no longer supported.

The appliance now uses two vNICs – eth0: Public Network for external traffic (and vCloud Director services such as UI/API, console proxy and internal messaging bus) the other eth1: Private Network for internal traffic – this is the one the embedded DB will use. It is recommended to use two different networks, however both vNICs can be also connected to the same network if that is the expected networking topology, but will need two IPs. Static routes can be configured easily.

Edit (2019/03/29): Correction, the two IPs must be from different subnets due to Photon routing firstboot issue.

It comes in three different sizes:

Cell only (no DB): 2 vCPUs, 8 GB RAM
Cell with embedded DB small: 2 vCPUs, 12 GB RAM
Cell with embedded DB large: 4 vCPUs, 24 GB RAM

Note that there is no DB only node. The cell services are running on all nodes and for high availability three nodes are the recommended minimum. NFS is still hard requirement for any appliance deployment (even with 1 node). Neither Cassandra DB (for VM metrics) nor RabbitMQ (for external integrations) are provided by the appliance and are still need to be optionally deployed.

vCloud Director binaries for non-appliance deployment are still provided but mixing of RHEL/CentOS nodes with appliance nodes is not supported. It is possible (but not that easy) to migrate your existing RHEL environment to appliance based one. The process requires upgrade to version 9.7 first and then migration so environments still using Oracle DB (which is not supported on 9.5 or 9.7) cannot go straight to embedded database and will need deployment of external PostgreSQL DB as intermediate step. Straight upgrade from 9.5 appliance version to 9.7 appliances is not supported and also involves migration.

Installation of certain agents (like vRealize Log Insight) into the appliance is supported as long as they are on the compatibility list but in general the appliance should be considered as a black box unlike RHEL/CentOS cells that can easily run additional software.

Backup of the database is triggered via create-db-backup command from the primary appliance. No automated backup scheduling is available at this time.

Other

Microsoft SQL is still supported database for vCloud Director, but marked for future deprecation. PostgreSQL version 9.x is not supported, version 10 is now required.
API versions: 32.0, 31.0, 30.0, 29.0 supported, 28.0, 27.0, 26.0, 25.0, 24.0 23.0, 22.0, 21.0, 20.0 supported but marked for deprecation.
CloudAPI API Explorer has moved to new location (same as vCloud Director 9.5.0.2). The user must log in to use a provider or tenant specific links:
/api-explorer/provider
/api-explorer/tenant/<org-name>
Compatible fully supported Hashicorp blessed Terraform provider 2.1 has been released here accompanied with Golang SDK.
pyvcloud Python SDK and vcd-cli have been updated.
New vRealize Orchestrator vCloud Director plugin has been released.
Scalability and resilience enhancements of VC Proxy (listener) and StatsFeeder (used for VM metric collection). These services are now distributed across all vCloud Director cells (there is no longer 5 minute failover of the listener when the cell running it dies). This is also manifested by multiple VC connection alert emails during start up or VC reconnect action.

↧

vCloud Director Appliance – Phone Home Configuration

May 3, 2019, 3:36 am

≫ Next: Patching vCloud Director 9.7 Appliance

≪ Previous: What’s New in vCloud Director 9.7

All VMware software contains Customer Experience Improvement Program (CEIP) feature also known as Phone Home. You can read here what it does, but in general it sends VMware over the internet anonymized configuration, feature and performance data. VMware thus can track which features customer use or do not use, on what versions they are, which database, scale etc.

This is usually opt-in feature, so the user is asked during the installation if they want to enable CEIP. Below is a screenshot from vCloud Availability deployment.

However, during vCloud Director 9.7 appliance based deployment (as opposed to the binary interactive or unattended one) there is no configuration of this feature and it defaults to always enabled state.

In case you want to disable it, or see what it is actually configured to, use cell-management-command configure-ceip command on any of the cells (no service restart is necessary).

root@vcd1 [ ~ ]# /opt/vmware/vcloud-director/bin/cell-management-tool configure-ceip --status
Participation enabled
root@vcd1 [ ~ ]# /opt/vmware/vcloud-director/bin/cell-management-tool configure-ceip --disable
Participation disabled

↧

Patching vCloud Director 9.7 Appliance

May 20, 2019, 2:46 am

≫ Next: vCloud Director 9.7 Appliance Tips

≪ Previous: vCloud Director Appliance – Phone Home Configuration

vCloud Director 9.7.0.1 patch has just been released and it is the first opportunity to patch the appliance edition of vCloud Director. Let me describe the process.

I have three appliance deployment with each node running the embedded database in active – standby – standby configuration. While in theory you could treat the appliance as regular Linux deployment and use the same patching process that was used for years by simply running vmware-vcloud-director-distribution-9.7.0-13635483.bin this would not patch just the vCloud Director binaries, but not the appliance packages. Therefore we must follow completely different process.

It should also be noted that currently we cannot use the automated orchestrated upgrade procedure or appliance UI. Hopefully both will come in the future as the appliance version matures.

Download the Appliance upgrade file: VMware_vCloud_Director_9.7.0.4264-13635483_update.tar.gz and unpack it to a transfer directory that is available to all the cells.

mkdir /opt/vmware/vcloud-director/data/transfer/update

tar xzf VMware_vCloud_Director_9.7.0.4264-13635483_update.tar.gz -C /opt/vmware/vcloud-director/data/transfer/update

Now on each cell we will have to set the repo, check if we need to update, shutdown the vCloud Director service and patch.

vamicli update –repo file:///opt/vmware/vcloud-director/data/transfer/update/

vamicli update –check

/opt/vmware/vcloud-director/bin/cell-management-tool -u administrator cell -s

vamicli update –install latest

Note that during the whole process that embedded database is still running on each node, so until the vcd service shutdown of the last node the vCloud Director is still functional.

Once the last node is patched we can upgrade the database schema. Before we do that we will make a database backup. This is done from the primary DB node (which node is primary can be checked at the vCD Database Availability UI running on each node on port 5480).

/opt/vmware/appliance/bin/create-db-backup

The backup is created in the pgdb-backup folder in the transfer share (e.g. /opt/vmware/vcloud-director/data/transfer/pgdb-backup/db-backup-2019-05-20-090502.tgz).

Now we can finally proceed with the database schema upgrade:

/opt/vmware/vcloud-director/bin/upgrade

If everything went right we can start vcd service on each cell and enjoy our updated vCloud Director instance.

service-vmware vcd start

↧

vCloud Director 9.7 Appliance Tips

May 23, 2019, 9:51 am

≫ Next: Load Balancing vCloud Director with NSX-T

≪ Previous: Patching vCloud Director 9.7 Appliance

About half a year ago I published blog post with similar title related to vCloud Director 9.5 appliance. The changes between appliance version 9.5 and 9.7 are so significant therefore I am dedicated a whole new article to the new appliance.

Introduction

The main difference compared to 9.5 version is that vCloud Director 9.7 now comes with embedded PostgreSQL database option that supports replication, with manually triggered semi-automated fail over. The external database is no longer supported with the appliance. Service providers can still use Linux installable version of vCloud Director with external PostgreSQL or Microsoft SQL databases.

The appliance is provided in single OVA file that contains 5 different configurations (flavors). Primary node (small and large), Standby node (small and large) and vCloud Director cell application node.

All node configurations include the vCloud Director cell application services, the primary and standby also includes the database and the replication manager binaries. It is possible to deploy non-DB HA architecture with just the primary and cell nodes, however for production the DB HA is recommended and requires minimum of 3 nodes. One primary and two standbys. The reason for the need of two standby is, that at the moment the replication is configured, PostgreSQL database will not process any write requests as it is not able to synchronously replicated them to at least one standby node. This has some implications also how to remove nodes from clusters which I will get to.

I should also mention that primary and standby nodes once deployed are from appliance perspective equivalent, so standby node can become primary and vice versa. There is always only one primary DB node in the cluster.

NFS transfer share is required and is crucial for sharing information among the nodes about the cluster topology. In the appliance-nodes folder on the transfer share you will find data from each node (name, IP addresses, ssh keys) that are used to automate operations across the cluster.

Contrary to other HA database solution, there is no network load balancing or single floating IP used here, instead all vCloud Director cells are for database access always pointed to the eth1 IP address of the (current) primary node. During the failover the cells are dynamically repointed to the IP of the new node that takes the role of primary.

Speaking about networking interfaces, the appliance has two – eth0 and eth1. Both must be used, and must have different subnets. The first one (eth0) is primarily used for the vCloud Director services (http – ports 80, 443, console proxy – port 8443, jmx – ports 61611, 61616), the second one (eth1) primary role is for database communication (port 5432). You can use both interfaces for other purposes (ssh, management, ntp, monitoring, communication with vSphere / NSX, ..). Make sure you follow the correct order during their configuration. It is so easy to mix up the subnets or port groups.

Appliance Deployment

Before starting deploying the appliance(s) make sure NFS transfer share is prepared and empty. Yes, it must be empty. When the primary node is deployed, responses.properties and other files are stored on the share and used to bootstrap other appliances in the server group and the database cluster.

The process always starts with the primary node (small or large). I would recommend large for production and small for everything else. Quite a lot of data must be provided in the form of OVF properties (transfer share path, networking, appliance and DB passwords, vCloud Director initial configuration data). As it is easy to make mistake I recommend snapshoting the VM before the first power-on so you can always revert back and fix whatever was wrong (the inputs can be changed in vCenter Flex UI, VM Edit Settings, vApp Options).

To see if the deployments succeeded or why it failed, examine the following log files on the appliance:

firstboot: /opt/vmware/var/log/firstboot
vcd setup: /opt/vmware/var/log/vcd /setupvcd

config data can be checked in: /opt/vmware/etc/vami/ovfEnv.xml

Successful deployment of the primary node results in a single node vCloud Director instance with non-replicated DB running on the same node and with responses.properties file saved to the transfer share ready for other nodes. The file contains database connection information, certificate keystore information and secret to decrypt encrypted passwords. Needless to say, pretty sensitive information to make sure the access to NFS is restricted.

Note about certificates: the appliance generates its own self-signed certificates for the vCloud Director UI/API endpoints (http) and consoleproxy access and stores them to user certificates.ks keystore in /opt/vmware/vcloud-director which is protected with the same password as the initial appliance root password. This is important as the encrypted keystore password in the responses.properties file will be used for configuration of all other appliances and thus you must deploy them with the same appliance root password. If not, you might end up with half working node, where database will be working but the vcd service will not due to failed access to the certificate.ks keystore.

To deploy additional appliance nodes you use standby or pure VCD cell node configs. For HA DB two standbys (at least). As these nodes all run VCD service, deploying additional pure VCD cell nodes is needed only for large environments. Size of the primary and standbys should always be the same.

Database Cluster Operations

The database appliances currently provides very simple UI on port 5480 showing the cluster state with the only operation to promote standby node and that only if the primary is failed (you cannot in the UI promote standby while primary is running).

Here is a cheat sheet of other database related operations you might need to do through CLI:

Start, stop and reload configuration of database on a particular node:
systemctl start vpostgres.service
systemctl stop vpostgres.service
systemctl reload vpostgres.service
Show cluster status as seen by particular node:
sudo -i -u postgres /opt/vmware/vpostgres/10/bin/repmgr -f /opt/vmware/vpostgres/10/etc/repmgr.conf cluster show
Planned DB failover (for example for a node maintenance). On the standby cell run:
sudo -i -u postgres /opt/vmware/vpostgres/current/bin/repmgr standby switchover -f /opt/vmware/vpostgres/current/etc/repmgr.conf –siblings-follow

Location of important database related files:
psql (DB CLI client): /opt/vmware/vpostgres/current/bin/psql
configuration, logs and data files: /var/vmware/vpostgres/current/pgdata

How to Rejoin Failed Database Node to the Cluster

The only supported way is to deploy a new node. You should deploy it as standby node and as mentioned in the deployment chapter it will automatically bootstrap and replicate the database content. That can take some time depending on the databse size. You will need to clean up the old failed VCD cell in vCloud Director Admin UI – Cloud Cells section.

There is an unsupported way to rejoin failed node without redeploy, but use at your own risk – all commands are triggered on the failed node:

Stop the DB service:
systemctl stop vpostgres.service

Delete stale DB data:
rm -rf /var/vmware/vpostgres/current/pgdata

Clone DB from the primary (use its eth1 IP):
sudo -i -u postgres /opt/vmware/vpostgres/current/bin/repmgr -h <primary_database_IP> -U repmgr -d repmgr -f /opt/vmware/vpostgres/current/etc/repmgr.conf standby clone

Start the DB service:
systemctl start vpostgres.service

Add the node to repmgr cluster:
sudo -i -u postgres /opt/vmware/vpostgres/current/bin/repmgr -h <primary_database_IP> -U repmgr -d repmgr -f /opt/vmware/vpostgres/current/etc/repmgr.conf standby register –force

How to Remove Failed Standby Node from the Cluster

On the primary node find the failed node ID via the repmgr cluster status command:
sudo -i -u postgres /opt/vmware/vpostgres/10/bin/repmgr -f /opt/vmware/vpostgres/10/etc/repmgr.conf cluster show

Now unregister failed node by providing its ID (e.g. 13416):
sudo -i -u postgres /opt/vmware/vpostgres/10/bin/repmgr -f /opt/vmware/vpostgres/10/etc/repmgr.conf standby unregister –node-id=13416

Clean up failed VCD cell in Cloud Cells VCD Admin UI.

How to Revert from DB Cluster to Single DB Node Deployment

As mentioned in the introduction, if you shutdown both (all) standby nodes, your primary database will stop serving write I/O request. So how to get out of this pickle?

First, unregister both (deleted) standbys via the previous mentioned commands:

sudo -i -u postgres /opt/vmware/vpostgres/10/bin/repmgr -f /opt/vmware/vpostgres/10/etc/repmgr.conf cluster show
sudo -i -u postgres /opt/vmware/vpostgres/10/bin/repmgr -f /opt/vmware/vpostgres/10/etc/repmgr.conf standby unregister –node-id=<id1>
sudo -i -u postgres /opt/vmware/vpostgres/10/bin/repmgr -f /opt/vmware/vpostgres/10/etc/repmgr.conf standby unregister –node-id=<id2>

Delete appliance-nodes subfolders on the transfer share corresponding to these nodes. Use grep -R standby /opt/vmware/vcloud-director/data/transfer/appliance-nodes to find out which folders should be deleted.

For example:
rm -Rf /opt/vmware/vcloud-director/data/transfer/appliance-nodes/node-38037bcd-1545-49fc-86f2-d0187b4e9768

And finally edit postgresql.conf and change synchronous_standby_names line to synchronous_standby_names = ”. This disables the wait for the transaction commit to at least one standby.

vi /var/vmware/vpostgres/current/pgdata/postgresql.conf

Reload DB config: systemctl reload vpostgres.service. The database should start serving write I/O requests.

Upgrade and Migration to Appliance

Moving both from Linux cells or 9.5 appliance to 9.7 appliance with embedded DB requires a migration. Unfortunately, it is not possible to just upgrade 9.5 appliance to 9.7 due to the embedded database design.

The way to get to 9.7 appliance is you will first upgrade the existing environment to 9.7, then deploy a brand new 9.7 appliance based environment and transplant the old database content to it.

It is a not a simple process. I recommend testing it up front on a production clone so you are not surprised during the actual migration maintenance window. The procedure is documented in official docs, I will provide only high level process and my notes.

Upgrade existing setup to 9.7(.0.x) version. Shut down VCD service and backup the database, global.properties, responses.properties and certificate files. Shut down the nodes if we are going to reuse their IPs.
Prepare clean NFS share and deploy single node appliance based VCD instance. I prefer to do the migration on single node instance and then expand it to multi node HA when the transplant is done.
Shut down the vcd service on the appliance, delete its vcloud database so we can start with the transplant.
We will restore the database (if the source is MS SQL we will use cell-management-tool migration) and overwrite global.properties and responses.properties files. Do not overwrite the user certificate.ks file.
Now we will run the configure script to finalize the transplant. At this point on 9.7.0.1 appliance I hit a bug that was related to SSL DB communication. ~~In case your global.properties file contains vcloud.ssl.truststore.password line, comment it out and run the configure script with SSL disabled. This is my example:~~
~~/opt/vmware/vcloud-director/bin/configure –unattended-installation –database-type postgres –database-user vcloud \~~
~~–database-password “VMware1!” –database-host 10.0.4.62 –database-port 5432 \~~
~~–database-name vcloud –database-ssl false –uuid –keystore /opt/vmware/vcloud-director/certificates.ks \~~
~~–keystore-password “VMware1!” –primary-ip 10.0.1.62 \~~
~~–console-proxy-ip 10.0.1.62 –console-proxy-port-https 8443~~
Update 2019/05/24: The correct way to resolve the bug is to also copy truststore file from the source (if the file does not exist, which can happen if the source was freshly upgraded to 9.7.0.1 or later start the vmware-vcd service at least once). The official docs will be updated shortly. The configure script can be then run with ssl set to true:

/opt/vmware/vcloud-director/bin/configure –unattended-installation –database-type postgres –database-user vcloud \
–database-password “VMware1!” –database-host 10.0.4.62 –database-port 5432 \
–database-name vcloud –database-ssl true–uuid –keystore /opt/vmware/vcloud-director/certificates.ks \
–keystore-password “VMware1!” –primary-ip 10.0.1.62 \
–console-proxy-ip 10.0.1.62 –console-proxy-port-https 8443

Note that the keystore password is the inital appliance root password! We are still reusing appliance autogenerated self-signed certs at this point.
If this went right, start the vcd service and deploy additional nodes as needed.
On each node replace self-signed certificate with the CA signed.

Backup and Restore

The backup of the appliance is very easy, the restore less so. The backup is triggered from the primary node with the command:

/opt/vmware/appliance/bin/create-db-backup

It creates single tar file with database content and additional data to fully restore the vCloud Director instance. The problem is that partial restores (that would reuse existing nodes) are nearly impossible (at least in HA DB cluster scenario) and the restore involve basically the same procedure as was the case with migration.

CA Certificate Replacement

There are probably many ways how to accomplish this. You can create your own keystore and import certificates from it with cell-management-tool certificates command to the existing appliance /opt/vmware/vcloud-director/certificates.ks keystore. Or you can replace the appliance certificate.ks file and re-run the configure command. See here for deep dive.

Note that the appliance UI (on port 5480) uses different certificates. These are stored in /opt/vmware/appliance/etc/ssl. I will update this post with the procedure once it is available.

External DB Access

In case you need to access vCloud Director database externally, you must edit pg_hba.conf file with the IP address or subnet of the external host. However, pg_hba.conf file is dynamically generated and any manual changes will be quickly overwritten. The correct procedure is to create on the DB appliance node new file (with any name) in /opt/vmware/appliance/etc/pg_hba.d folder with a similar line:

host all all 10.0.2.0/24 md5

Which means that any host from 10.0.2.0/24 subnet will be able to log in via password authentication method with any database user account and access any database.

There is currently not an easy way to use network load balancer to always point to the primary node. This is planned for the next vCloud Director release.

Postgres User Time Bomb

Both vCloud Director 9.7 and 9.7.0.1 appliance version have unfortunate time bomb issue where postgres user account will expire in 60 days (since the appliance creation, not its deployment). When that happens, the repmgr commands triggered via ssh stop working. So for example UI initiated failover with the promote button will not work.

The 9.7 appliance postgres user expires May 25 2019, 9.7.0.1 appliance postgres user expires July 9 2019. The fix is as root on each DB appliance run the following command:
chage -M -1 -d 1 postgres

You can check the postgres account status with:
chage -l postgres

↧

Load Balancing vCloud Director with NSX-T

June 28, 2019, 4:24 am

≫ Next: vCloud Director Object Storage Extension – Deep Look

≪ Previous: vCloud Director 9.7 Appliance Tips

I just have had a chance for the first time to set up vCloud Director installation that was fronted by NSX-T based load balancer (version 2.4.1). In the past I have blogged how to load balance vCloud Director cells with NSX-V:

Load Balancing vCloud Director Cells with NSX Edge Gateway

vCloud OpenAPI – Large Payload Issue with Load Balancer

NSX-T differs quite a lot from NSX-V therefore the need for this article. The load balancer instance is deployed into the NSX-T Edge Cluster which is a set of virtual or physical NSX-T Edge Nodes. There are also strict sizing guidelines related to the size and number of LB and size of Edge Nodes – see the official docs.

Certificates

Import your VCD public cert in the NSX Manager UI: System > Certificates > Import Certificate. You will need to provide name, full certificate chain, private key and set is as Service Certificate. If it is signed by Enterprise CA you will also before that import the CA cert.

Monitor

Create new monitor in Networking > Load Balancing > Monitors > Add Active Monitor HTTPs

protocol HTTPs
monitoring port 443
default timers
HTTP Request Configuration: GET /cloud/server_status, HTTP Request Version: 1
HTTP Response Configuration: HTTP response body: Service is up.
SSL Configuration: Enabled, Client Certificate: None

Profiles

Application Profile

Networking > Load Balancing > Profiles > Select Profile Type: Application > Add Application Profile > HTTP

Here in the UI we can set only Request Header Size and Request Body Size. Set both to 65535. We will later use API to also configure Response Header Size.

Persistence and SSL Profiles

I will reuse existing default-source-ip-lb-persistence-profile and default-balanced-client-ssl-profile.

Server Pools

Networking > Load Balancing > Server Pools > Add Server Pool

Algorithm: Least Connection
Active Monitor: picked the one created before
Select members: Enter individual members (do not enter port as we will reuse the pool for multiple ports)

Virtual Servers

We will add two virtual servers. One for UI/API and another for VM Remote Console connections. For both I have picked the same IP address from the cell logical segment. Ports will be different (443 vs 8443).

vCloud UI

Add virtual server: L7 HTTP
Ports: 443
Ignore Load Balancer placement for now
Server Pool: the one we created before
Application Profile: the one we created before
Persistence: default-source-ip-lb-persistence-profile
SSL Configuration: Client SSL: Enabled, Default Certificate: the one we imported before, Client SSL Profile: default-balanced-client-ssl-profile
Server SSL: Enabled, Client Certificate: None, Server SSL Profile: default-balanced-client-ssl-profile

vCloud Console

Add virtual server: L4 TCP
Ports: 8443
Ignore Load Balancer placement for now
Server Pool: the one we created before
Application Profile: default-tcp-lb-app-profile
Persistence: disabled

Load Balancer

Now we can create load balancer instance and associate the virtual servers with it. Create the LB instance on the Tier 1 Gateway which routes to your VCD cell network. Make sure the Tier 1 Gateway runs on an Edge node with the proper size (see the doc link before).

Networking > Load Balancing > Load Balancers > Add Load Balancer

Size: small
Tier 1 Gateway
Add Virtual Servers: add the 2 virtual servers created in the previous step

Now we have the load balancer up and running you should get all green in the status column. We are not done yet though.

Firstly we need to increase the response header size as vCloud Director Open API sends huge headers with links. Without this, you would get H5 UI errors (Nginx 502 Bad Gateway) and some API calls would fail. This can be currently done only with NSX Policy API. Fire up Postman or Curl and do GET and then PUT on the following URI:

NSX-manager/policy/api/v1/infra/lb-app-profiles/<profile-name>

in the payload change the response_header_size to at least ~~10240~~ 50000 bytes.

And finally we will need to set up NAT so our load balanced virtual servers are available both from the outside world (on Tier-0 Gateway) as well from the internal networks. This is quite network topology specific, but do not forget the cells itself must properly connect to the public (load balanced) URL configured in vCloud Director public addresses.

↧

vCloud Director Object Storage Extension – Deep Look

July 26, 2019, 3:07 am

≫ Next: vCloud Director – Storage IOPS Management

≪ Previous: Load Balancing vCloud Director with NSX-T

VMware released last week another product that extends vCloud Director and enables Cloud Service Providers to offer additional services on top of vCloud Director out-of-the-box IaaS. Where vCloud Availability adds Disaster Recovery and migration services to vCloud Director, Container Service Extension adds the ability to deploy Kubernetes clusters, vRealize Operations Tenant App brings advanced workload monitoring, the newly released vCloud Director Object Storage Extension offers easy access for the tenants to a scalable, cheap, durable and network accessible storage for their applications.

As the name suggests it is an extension, that lives side by side to vCloud Director and that requires 3rd party object storage provider. In the 1.0 release the only supported storage provider is Cloudian Hyperstore, however other storage providers (cloud or on-prem) are coming in future releases. The extension provides multitenant S3 compatible API endpoint as well as user interface plugin for vCloud Director.

Use Cases

The object storage service is fully in the service provider competence who decides its parameters (SLAs, scalability) and upsells it to existing or new vCloud Director tenants.

The tenants can provision storage buckets and directly upload/download objects into them via the UI, or use S3 APIs or S3 compatible solutions to do so. Objects can be also accessible via S3 path-style URL for easy sharing.

Additionally tenants can provision application credentials and use them in their (stateless) workloads to persist application configuration or logs and have access to unstructured data (web servers).

Tight integration with vCloud Director also offers usage of object storage as archival or distribution resource for vCloud Director vApps and Catalogs. Tenant can capture existing vApps to a dedicated object storage bucket and later restore it to its Org VDCs.

Alternatively whole vCloud Director Organization Catalog of vApp templates and ISO images can be captured to the bucket or created from scratch by uploading individual ISO and OVA objects and used by same or another Organization even in a different vCloud Director instance via the catalog subscribe mechanism.

S3 API Compatibility

The solution supports S3 API with AWS Signature V4, which means existing applications can easily leverage the Object Storage service without the need for rewrites. The below screenshots show usage of S3 Browser freeware Windows client to manage the files.

Objects can be tagged and assigned with metadata, buckets can be tagged as well. Server side encryption can be configured by the Org Admin at tenant level or via API at object level. SSE-S3 (server managed key) and SSE-C (client supplied keys) methods are supported. Access Control List (ACL) permissions can be set at bucket or at object. Buckets can be shared within the tenant (to subset or all users) or made public.

Security credentials (pair of access and secret keys) are of two types. User credentials (can manage all users buckets and objects) and application credentials (can only manage subset of buckets). Object Storage Extension automatically creates user credential for each tenant user, however additional user or application credentials can be created. Credentials can be disabled and/or deleted.

The full set of supported S3 APIs is documented via the swagger UI on the extension endpoint (/docs) or here.

Provider Management

While the object storage tenant consumption APIs are standardized (S3 AWS APIs), each storage platform uses different admin APIs. Object Storage Extension currently does not expose provider APIs. The tenant administration (service entitlement) is done from the vCloud Director provider UI.

Other administration (quotas, usage metering, platform monitoring, etc.) are done directly through the Cloudian Management Console where the provider admin is redirected from the vCloud Director UI or optionally through Cloudian HyperStore Admin APIs.. This will change in later releases when more storage providers are supported.

Roles

Object Storage Service uses three different user personas. Provider administrator, tenant administrator and tenant user. Provider administrator manages tenant access to service and the storage platform. Tenant administrator has access to all buckets and objects of a particular tenant and can monitor consumption at organization, user or bucket level. Tenant user can only access her own buckets and objects or the ones shared with the user.

The user personas map to users based on their vCloud Director rights. The mapping in general corresponds to System Administrator / Organization Administrator / other non Organization Administrator global roles, unless these were changed in vCloud Director.

Provider Administrator (system context):

General: Administrator View
Provider VDC: View
Organization: View
UI Plugins: View

Tenant Administrator:

General: Administrator View
Organization VDC: View
UI Plugins: View
excludes: Provider VDC: View

Tenant User:

UI Plugins: View
excludes: Administrator: View

Architecture

The Object Storage Extension has 1:1 relationship with vCloud Director instance and 1:1 relationship with the storage provider (Cloudian HyperStore). Each vCloud Director Organization that is enabled to consume the service will have unique counterpart at the storage platform (Cloudian HyperStore business groups). Same is valid for users. As it is vCloud Director who provides authentication to the service, it is fully multitenant.

The diagram (taken from the official documentation) below shows all the components needed for the Object Service Extension including the traffic flows. vCloud Director 9.1 and newer is supported. Next to the vCloud Director cells you will need to deploy one or more (for HA and scalability) RHEL/CentOS/Oracle Linux VM nodes (dark green in the picture) that will run the Object Storage Extension service that is provided is RPM package. These VMs are essentially stateless and persist all their data in PostgreSQL DB. This could be vCloud Director external PostgreSQL DB (if possible) or a dedicated database just for the Object Storage Extension.

The service needs its own public IP address as it runs (by default) on port 443. S3 API clients or the vCloud Director UI plugin will access this endpoint. vCloud API extensibility is not used, but vCloud Director HTML 5 UI extensibility is.

The extension VM nodes need to have access to vCloud API endpoint for user authentication and for the vApp/Catalog import/export functionality. Additionally they will need fast access to the underlying object storage platform (in our case Cloudian HyperStore). Cloudian HyperStore is fully distributed with a minimum supported deployment of three (fully equivalent) storage nodes and scales essentially indefinitely. Each storage node also provides UI/API functionality. Fast L4 load balancing should be used to forward the extension calls to all storage nodes. Multiple APIs (S3, IAM and Admin) each running on separate TCP port need to be accessed as well as Cloudian Management Console for the Provider UI plugin redirection (this is the only service that needs to be set up with sticky sessions).

As can be seen the Object Storage Extension is in the datapath of the object transfers that are persisted on the storage nodes. The overhead is less than 10% when compared to accessing Cloudian directly (with TLS sessions) however the extension nodes must be sized properly (it is a CPU intensive workload) so they do not become a bottleneck. Both scale-out and scale-up options are possible.

The Cloudian HyperStore storage nodes can be deployed in three different configurations. For small environments or testing it can be deployed as virtual appliance running on vSphere (CentOS + HyperStore binary) leveraging shared (more expensive) or local disk storage (HyperStore replicates objects across storage nodes so it does not need highly available shared storage). Another options are to deploy Cloudian Hyperstore on dedicated bare metal hardware or to purchase hardware appliances directly from Cloudian. It is up to service provider to decide which form factor to use to tailor the deployment for their particular use case.

Conclusion

As this is a new product VMware is keen on collecting feedback from vCloud Director service providers on which additional storage platforms and new features should be added in the next version. You can engage with the product team via the VMware Communities website.

↧

vCloud Director – Storage IOPS Management

August 1, 2019, 10:05 am

≫ Next: Custom Links in the H5 vCloud Director Portal

≪ Previous: vCloud Director Object Storage Extension – Deep Look

It is a little known fact that besides compute (capacity and performance), storage capacity and external network throughput rate, vCloud Director can also manage storage IOPS (input / output or read and write operations per second) performance at provisioned virtual disk granularity. This post summarizes the current capabilities.

Cloud providers usually offer different tiers of storage that is available to tenants for consumption. IOPS management helps them to differentiate these tiers and enforce the virtual disk performance based on IOPS metric. This eliminates noisy neighbor problem, but also makes both consumption and capacity management more predictive.

vCloud Director relies on vSphere to control the maximum IOPS a VM has access to on particular storage policy through a Storage I/O Control functionality which is supported on VMFS (block) and NFS datastores (no vSAN). In vSphere this is defined at virtual hard disk level, but is enforced at VM level. vSphere however does not manage available IOPS capacity of a datastore the same way it can do with compute. That’s where vCloud Director comes in.

The cloud provider first needs to create a new vSphere custom field (iopsCapacity) and use it do define for vCloud Director managed datastore their IOPS capacity. This is done via vCenter Managed Browser Object UI and is described in KB 2148300.

Definition of Custom Field iopsCapacity in vCenter MOB UI

Configuring datastore IOPS capacity in vCenter MOB UI

vCloud Director consumes vSphere datastores through storage policies. In my case I have tag based storage policy named: 2_IOPS/GB and as the name suggests the intention is to provide two provisioned IOPS per each GB of capacity. 40 GB hard disk thus should provide 80 IOPS.

Once the storage policy is synced with vCloud Director we can add it to a Provider VDC and consume it in its Org VDCs. vCloud Director will keep track of the storage policy IOPS capacity and how much has been allocated. That information is available with vCloud API when retrieving the Provider VDC storage profile representation:

Note that the pvdcStorageProfile IopsCapacity is the total IopsCapacity for all datastores as tagged in vCenter belonging to the storage policy.

The actual definition of storage policy parameters is done via PUT call at Org VDC level again with API on the Org VDC storage profile representation. The cloud provider supplies IopsSetting element that consists of the following parameters:

Enabled: True if this storage profile is IOPS-based placement enabled.
DiskIopsMax: the max IOPS that can be given to any disk (value 0 means unlimited)
DiskIopsDefault: the default IOPS given to any/all disks associated with this VdcStorageProfile if user doesn’t specify one
StorageProfileIopsLimit: the max IOPS that can be used by this VdcStorageProfile. In other words: maximum IOPS that can be assigned across all disks associated with this VdcStorageProfile
DiskIopsPerGbMax: similar to DiskIopsMax but instead of a specific value, it’s the ratio of size (in GB) to IOPS. if set to 1, then a 1 GB disk is limited to 1 IOPS, if set to 10, then a 1 GB disk is limited to 10 IOPS, etc.

When a user deploys a VM utilizing IOPS enabled storage policy she can set specific requested IOPS for each disk though API (0 is treated as unlimited), or set nothing and vCloud Director will set default limit based on DiskIopsDefault or DiskIopsPerGbMax x DiskSizeInGb value, whichever is lower. The requested value must always be smaller than DiskIopsMax and also smaller than DiskIopsPerGbMax x DiskSizeInGb. The DiskIopsMax and DiskIopsDefault values must also be lower that StorageProfileIopsLimit.

In my case I wanted always to set IOPS limit to 2 IOPS per GB, so I configured Org VDC storage policy in the following way:

And this is provisioned VM as seen in vCloud Director UI

and in vCenter UI.

Additional observations:

Datastore clusters cannot be used together with IOPS storage policies. The reason is that when datastore clusters are used it is vCenter who is responsible for placing the disk to a specific datastore and as mentioned above, vCenter does not track IOPS capacity at datastore level, whereas the vCloud Director placement engine will take into account both the datastore capacity (GB) and IOPS capacity when finding the suitable datastore for a disk.
vSAN is not supported as it does not support SIOC. vSAN advanced storage policies allow specifying IOPS limits per object and can be used instead.
Disk IOPS can be assigned only to regular VMs, not to VM templates.
The disk IOPS will be always allocated against the Org VDC storage profile even if the VM is powered-off. This means the cloud provide can oversubscribe IOPS at the provider VDC storage profile level.
System administrator can override IOPS limits when deploying/editing tenant VMs in the system context.
Some vCloud Director versions have bug where the UI sends 0 (unlimited) IOPS for disk instead of null (undefined) which might result in provisioning error if it is not compliant with the policy limit.

↧