December 24, 2020

Hands on with Azure Arc enabled data services on AKS HCI - part 3

December 24, 2020

This is part 3 in a series of articles on my experiences of deploying Azure Arc enabled data services on Azure Stack HCI AKS.

Part 1 discusses installation of the tools required to deploy and manage the data controller.
Part 2 describes how to deploy and manage a PostgreSQL hyperscale instance.
Part 3 describes how we can monitor our instances from Azure.

In the preview version, if you want to view usage data and metrics for your Arc enabled data services instances, you have to run some manual azdata CLI commands to first dump the data to a JSON file, and another command to upload to your Azure subscription. This can be automated by running it as a scheduled task, or CRON job.

Pre-requisites

Before we can upload and data, we need to make sure that some pre-reqs are in place.

If you’ve been following the previous two parts, the required tools should already be in place : Azure (az) and Azure Data (azdata) CLIs.

We also need to make sure the the necessary resource providers are registered (Microsoft.AzureArcData). Following the instructions here shows how you can do it using the Azure CLI, and is straightforward. The documents don’t show how you can use PowerShell to achieve the same outcome, so here are the commands needed, if you want to try it out :) :

$Subscription = '<your subscriptionName'
$ResourceProviderName = 'Microsoft.AzureArcData'

$AzContext = Get-AzContext

if (-not ($AzContext.Subscription.Name -eq $Subscription)) {
    Login-AzAccount -Subscription $Subscription
}


$resourceProviders = Get-AzResourceProvider -ProviderNamespace $ResourceProviderName  
$resourceProviders | Select-Object ProviderNamespace, RegistrationState
$resourceProviders | Where-Object RegistrationState -eq 'NotRegistered' | Register-AzResourceProvider

Get-AzResourceProvider -ProviderNamespace $ResourceProviderName  | Select-Object ProviderNamespace, RegistrationState

The next recommended pre-req is to create a service principal that can be used to automate the upload of the data. This is easy to create using the Azure CLI and as detailed here.

az ad sp create-for-rbac --name azure-arc-metrics

Make a note of the appId (SPN_CLIENT_ID), password (SPN_CLIENT_SECRET) and tenant (SPN_TENANT_ID) values that are returned after the command has been run.

Run the following command to get the Subscription Id.

az account show --query {SubscriptionId:id}

Make a note of the output.

The next thing to do is to assign the service principal to the Monitoring Metrics Publisher role in the subscription.

az role assignment create --assignee <appId> --role "Monitoring Metrics Publisher" --scope subscriptions/<Subscription ID>

The last thing to do is to setup a Log Analytics Workspace (if you don’t already have one). Run the following Azure CLI commands to create a resource group and workspace:

az group create --location EastUs --name AzureArcMonitoring
az monitor log-analytics workspace create --resource-group AzureArcMonitoring --workspace-name AzureArcMonitoring

From the output, take note of the Workspace Id.

The last thing we need to retrieve is the shared key for the workspace. Run the following:

az monitor log-analytics workspace get-shared-keys --resource-group AzureArcMonitoring --workspace-name AzureArcMonitoring-demo

Take note of the primary or secondary key.

Retrieving data

Retrieving data from our data controller and uploading to Azure is currently a two step process.

The following three commands will export the usage, metrics and log data to your local system:

azdata arc dc export --path c:\temp\arc-dc-usage.json --type usage --force
azdata arc dc export --path c:\temp\arc-dc-metrics.json --type metrics --force
azdata arc dc export --path c:\temp\arc-dc-logs.json --type logs --force

Note: Be careful using the —force switch, as it will overwrite the file specified. If you haven’t uploaded the existing data from that file to Azure, there is a potential you will miss those collected metrics for the period.

The second step, we have to upload the data to our Azure Subscription.

You can check the resource group that the data controller was deployed to via the data controller dashboard in ADS:

Once you have the service principal details, run the following commands:

azdata arc dc upload --path c:\temp\arc-dc-usage.json
azdata arc dc upload --path c:\temp\arc-dc-metrics.json
azdata arc dc upload --path c:\temp\arc-dc-logs.json

For each command. it will prompt for the tenant id, client id and the client secret for the service principal. Azdata currently does not allow you to specify these parameters, but you can set them as Environment variables, so you are not prompted for them. For the log upload, a Log Analytics Workspace ID and secret is required

Here’s an example PowerShell script that you can use to automate the connection to your data controller, retrieve the metrics and also upload to your Azure subscription:

$Env:SPN_AUTHORITY='https://login.microsoftonline.com'
$Env:SPN_TENANT_ID = "<SPN Tenent Id>"
$Env:SPN_CLIENT_ID = "<SPN Client Id>"
$Env:SPN_CLIENT_SECRET ="<SPN Client secret>"
$Env:WORKSPACE_ID = "<Your LogAnalytics Workspace ID"
$Env:WORKSPACE_SHARED_KEY = "<Your LogAnalytics Workspace Shared Key"
$Subscription = "<Your SubscriptionName> "


$Env:AZDATA_USERNAME = "<Data controller admin user>"
$Env:AZDATA_PASSWORD = "<Data controller admin password>"
$DataContollerEP = "https://<DC IP>:30080"

# Find your contexts: kubectl config get-contexts
$kubeContextName = 'my-workload-cluster-admin@my-workload-cluster'

$dataPath = 'c:\temp'

kubectl config use-context $kubeContextName

azdata login -e $DataContollerEP

az login --service-principal -u $Env:SPN_CLIENT_ID -p $Env:SPN_CLIENT_SECRET --tenant $Env:SPN_TENANT_ID

if ($Subscription) {
    az account set --subscription $Subscription
}

if (-not (test-path -path $dataPath)) {
    mkdir $dataPath
}

cd $dataPath

. azdata arc dc export --type metrics --path metrics.json --force
. azdata arc dc upload --path metrics.json

. azdata arc dc export --type usage --path usage.json --force
. azdata arc dc upload --path usage.json

. azdata arc dc export --type logs --path logs.json --force
. azdata arc dc upload --path logs.json

After you have uploaded the data to Azure, you should be able to start to view it in the Azure Portal.

At the time of writing, if you try and use the link from the Azure Arc Data Controller Dashboard in ADS, it will throw an error:

The reason for this is that the URI that is constructed by the ARC extension isn’t targeting the correct resource provider. It uses Microsoft.AzureData, when the correct one is Microsoft.AzureArcData. I assume this will be fixed in an imminent release of the extension soon as I think the namespace has changed very recently. In the meantime, it can be manually patched by doing the following (correct for version 0.6.5 of the Azure Arc extension ) :

Edit:

%USERPROFILE%\.azuredatastudio\extensions\microsoft.arc-0.6.5\dist\extension.js

Find and replace all instances of Microsoft.AzureData with Microsoft.AzureArcData (there should be 4 in total).

If the file has no formatting (due to it being .js) just do the find/replace. I used a VS Code extension (JS-CSS-HTML Formatter) to beautify the formatting as can be seen in the screen grab above.

Save the file and restart the Azure Data Studio Session. When you now click on Open in Azure Portal, it should open as expected.

As it is, we can’t actually do that much from the portal, but enhanced capabilities will come in time
We can view a bit more about out PostgreSQL instances, however, looking similar to the Azure Data Studio Dashboard:

When looking at Metrics, make sure you select the correct namespace - in my example it’s postgres01

The integration with the portal and the uploaded logs is a bit hit and miss. I found that clicking on the Logs link from my resource did not point me to the Log Analytics workspace I specified when I ran my script, so I had to manually target it before I could query the logs.

Thanks for reading this series, and I hope it will help others get around some of the small gotchas I encountered when evaluating this exciting technology stack!

Danny McDermott

December 9, 2020

Azure Stack, Azure Arc

Hands on with Azure Arc enabled data services on AKS HCI - part 2

Danny McDermott

December 9, 2020

Azure Stack, Azure Arc

This is part 2 in a series of articles on my experiences of deploying Azure Arc enabled data services on Azure Stack HCI AKS.

Part 1 discusses installation of the tools required to deploy and manage the data controller.
Part 2 describes how to deploy and manage a PostgreSQL hyperscale instance.
Part 3 describes how we can monitor our instances from Azure.

First things first, the PostgreSQL extension needs to be installed within Azure Data Studio. You do this from the Extension pane. Just search for ‘PostgreSQL’ and install, as highlighted in the screen shot below.

I found that with the latest version of the extension (0.2.7 at time of writing) threw an error. The issue lies with the OSS DB Tools Service that gets deployed with the extension, and you can see the error from the message displayed below.

After doing a bit of troubleshooting, I figured out that VCRUNTIME140.DLL was missing from my system. Well, actually, the extension does have a copy of it, but it’s not a part of the PATH, so can’t be used. Until a new version of the extension resolves this issue, there are 2 options you can take to workaround this.

Install the Visual C++ 2015 Redistributable to your system (Preferred!)
Copy VCRUNTIME140.DLL to %SYSTEMROOT% (Most hacky; do this at own risk!)
You can find a copy of the DLL in the extension directory:

%USERPROFILE%\.azuredatastudio\extensions\microsoft.azuredatastudio-postgresql-0.2.7\out\ossdbtoolsservice\Windows\v1.5.0\pgsqltoolsservice\lib\_pydevd_bundle

Make sure to restart Azure Data Studio and check that the problem is resolved by checking the output from the ossdbToolsService. The warning message doesn’t seem to impair the functionality of the extension, so I ignored it.

Now we’re ready to deploy a PostgreSQL cluster. Within ADS, we have two ways to do this.

1. Via the data controller management console:

2. From the Connection ‘New Deployment…’ option. Click on the ellipsis (…) to present the option.

Whichever option you choose, the next screen that is presented are similar to one another. The example I‘ve shown is via the ‘New Connection’ path and shows more deployment types. Installing via the data controller dashboard, the list is filtered to what can be deployed to the cluster (the Azure Arc options).

Select PostgreSQL Hyperscale server groups - Azure Arc (preview), make sure that the T&C’s acceptance is checked and then click on Select.

The next pane that’s displayed is where we defined the parameters for the PostgreSQL instance. I’ve highlighted the options you must fill in as a minimum. In the example, I’ve set the number of workers to 3. By default it is set to 0. If you leave it as the default, a single worker is deployed.

Note: If you’re deploying more than one instance to your data controller. make sure to seta unique Port for each server group. The default is 5432

Clicking on Deploy runs through the generated Jupyter notebook.

After a short period (minutes), you should see it has successfully deployed.

ADS doesn’t automatically refresh the data controller instance, so you have to manually do this.

Once refreshed, you will see the instance you have deployed. Right click and select Manage to open the instance management pane.

As you can see, it looks and feels similar to the Azure portal.

If you click on the Kibana or Grafana dashboard links, you can see the logs and performance metrics for the instance.

Note: The Username and password are are what you have set for the data controller, it is not the password you set for the PostgreSQL instance.

From the management pane, we can also retrieve the connection strings for our PostgreSQL instance. It gives you the details for use with various languages.

Finally in settings, Compute + Storage in theory allows you to change the number of worker nodes and the configuration per node. In reality, this is read-only from within ADS, as changing any of the values and saving them has no effect. If you do want to change the config, we need to revert to the azdata CLI. Jump here to see how you do it.

In order to work with databases and tables on the newly deployed instance, we need to add our new PostgreSQL server to ADS.

From Connection strings on our PostgreSQL dashboard, make a note of the host IP address and port, we’ll need this to add our server instance.

From the Connections pane in ADS, click on Add Connection.

From the new pane enter the parameters:

Parameter	Value
Connection type	PostgreSQL
Server name	name you gave to the instance
User name	postgres
Password	Password you specified for the postgreSQL deployment
Database name	Default
Server group	Default
Name (Optional)	blank

Click on Advanced, so that you can specify the host IP address and port

Enter the Host IP Address previously noted, and set the port (default is 5432)

Click on OK and then Connect. If all is well, you should see the new connection.

Scaling Your Instance

As mentioned before, if you want to be modify the running configuration of your instance, you’ll have to use the azdata CLI.

First, make sure you are connected and logged in to your data controller.

azdata login --endpoint https://<your dc IP>:30080

Enter the data controller admin username and password

To list the postgreSQL servers that are deployed, run the following command:

azdata arc postgres server list

To show the configuration of the server:

azdata arc postgres server show -n postgres01

Digging through the JSON, we can see that the only resources requested is memory. By default, each node will use 0.25 cores.

I’m going to show how to increase the memory and cores requested. For this example, I want to set 1 core and 512Mb

azdata arc postgres server edit -n postgres01 --cores-request 1 --memory-request 512Mi

If we show the config for our server again, we can see it has been updated successfully

You can also increase the number of workers using the following example

azdata arc postgres server edit -n postgres01 --workers 4

Note: With the preview, reducing the number of workers is not supported.

If you do make any changes via azdata, you will need to close existing management panes for the instance and refresh the data controller instance within ADS for them to be reflected.

Currently, there does not appear to be a method to increase the allocated storage via ADS or the CLI, so make sure you provision your storage sizes sufficiently at deployment time.

You can deploy more than one PostgreSQL server group to you data controller, the only thing you will need to change is the name and the port used

You can use this command to show a friendly table of the port that the server is using:

azdata arc postgres server show -n postgres02 --query "{Server:metadata.name, Port:spec.service.port}" --output table

In the next post, I’ll describe how to upload logs and metrics to Azure for your on-prem instances.