Azure

The Concept of Cloud Adjacency

datacenter.jpg

Several years ago, when we first started theorizing around hybrid cloud, it was clear that to do our definition of hybrid cloud properly, you needed to have a fast, low latency connection between your private cloud and the public clouds you where using. After talking to major enterprise customers we theorized five cases where this type of arrangement made sense, and actually accelerated cloud adoption.

  1. Moving workloads but not data - If you expected to move workloads between cloud providers, and between public and private to take advantage of capacity and price moves, moving the workloads quickly and efficiently meant, in some cases, not moving the data.
  2. Regulatory and Compliance Reasons - For regulatory and compliance reasons, say the data is sensitive and must remain under corporate physical ownership wile at rest, but you still want to use the cloud for apps around that data (including possible front ending it).
  3. Application Support - A case where you have a core application that can’t move to cloud due to support requirements (EPIC Health is an example of this type of application), but you have other surround applications that can move to cloud, but need a fast, low latency connection back to this main application.
  4. Technical Compatibility  - A case where there is a technical requirement, say you need real time storage replication to a DR site, or very high IOPS, and Azure can’t for one reason or another handle the scenario, in this case that data can be placed on a traditional storage infrastructure and presented to the cloud over the high bandwidth low latency connection.
  5. Cost – There are some workloads which today are significantly more expensive to run in cloud then on existing large on premise servers. These are mostly very large compute platforms that scale up rather then out. Especially in cases where a customer already owns this type of equipment and it isn't fully depreciated. Again, it may make sense to run surround systems in public cloud that run on commodity two socket boxes, wile retaining the large 4 socket, high memory instances until fully depreciated.

 

All of these scenarios require a low latency high bandwidth connection to the cloud, as well as a solid connection back to the corporate network. In these cases you have a couple options for getting this type of connection.

  1. Place the equipment into your own datacenter, and buy high bandwidth Azure express route and if needed Amazon Direct Connect connections from an established Carrier like AT&T, Level3, Verizon, etc.
  2. Place the equipment into a carrier colocation facility and create a high bandwidth connection from there back to your premise. This places the equipment closer to the cloud but introduces latency between you and the equipment. This works because most applications assume latency between the user and the application, but are far less tolerant of latency between the application tier and the database tier or between the database and its underlying storage. Additionally you can place a traditional WAN Accelerator (Riverbed or the like) on the link between the colocation facility and your premise.

 

For simply connecting your network to Azure, both options work equally well. If you have one of the 5 scenarios above however, the later options (colocation) is better. Depending on the colocation provider, the latency will be low to very low (2-40ms depending on location). All of the major carriers offer a colocation arrangement that is close to the cloud edge. At Avanade, I run a fairly large hybrid cloud lab environment, which allows us to perform real world testing and demonstrations of these concepts. In our case we’ve chosen to host the lab in one of these colocation providers, Equinix. Equinix has an interesting story in that they are a carrier neutral facility. In other words, they run the datacenter and provide a brokerage (The Cloud Exchange), but don’t actually sell private line circuits. You connect thru their brokerage directly to the other party. This is interesting because it means I don’t pay for a 10gig local loop fee, i simply pay for access to the exchange. Right now I buy 2 10gig ports to the exchange, and over those I have 10gigs of express route and 10 gigs of Amazon Direct connect delivered to my edge routers. The latency to a VM running within my private cloud in the lab and a VM in Amazon or azure is sub 2ms. This is incredibly fast. Fast enough to allow me to showcase each of the above scenarios.

 

We routinely move workloads between providers without moving the underlying data by presenting the data disks over iscsi from our Netapp FAS to the VM running at the cloud provider. When we move the VM, we migrate the OS and system state, but the data disk is simply detached and reattached at the new location. This is possible due to the low latency connection and is a capability that Netapp sells as Netapp Private Storage or NPS. This capability is also useful for the regulatory story of keeping data at rest under our control. The data doesn't live in the cloud provider and we can physically pinpoint its location at all times. Further this meets some of the technical compatibility scenario. Because my data is back ended on a capable SAN with proper SAN based replication and performance characteristics, I can run workloads that I may not otherwise have been able to run in cloud due to IOPS, cost for performance or feature challenges with cloud storage.

 

Second, I have compute located in this adjacent space, some of this compute are very large multi TB of RAM, quad socket machines running very large line of business applications. In this case those LOB applications have considerable surround applications that can run just fine on public cloud compute, but this big one isn’t cost effective to run on public cloud since I've already invested in this hardware. Or perhaps I’m not allowed to move it to public due to support constraints. Another example of this is non x86 based platforms. Lets say I have an IBM POWER or mainframe platform. I don’t want to retire it or re platform it due to cost, but I have a number of x86 based surround applications that I’d like to move to cloud. I can place my mainframe within the cloud adjacent space, and access those resources as if they too were running in the public cloud.

 

As you can see, cloud adjacent space opens up a number of truly hybrid scenarios. We’re big supporters of the concept when it makes sense, and wile there are complexities, it can easily unlock move additional workloads to public cloud.

Writing your own custom data collection for OMS using SCOM Management Packs

OMS-Cover-Photo.png

[Unsupported] OMS is a great data collection and analytics tool, but at the moment it also has some limitations. Microsoft has been releasing Integration Pack after Integration Pack and adding features at a decent pace, but unlike the equivalent SCOM Management Packs the IPs are somewhat of a black box. Awhile back, I got frustrated with some of the lack of configuration options in OMS (then Op Insights) and decided to “can opener” some of the features. I figured the best way to do this was to examine some of the management packs that OMS would install into a connected SCOM Management Group and start digging through their contents. Now granted, lately the OMS team has added a lot more customization options than they used to have when I originally traced this out. You can now add custom performance collection rules, custom log collections, and more all from within the OMS portal itself. However, there are still several advantages to being able create your own OMS collection rules in SCOM directly. These include:

  • Additional levels of configuration customization beyond what’s offered in the OMS portal, such as the ability to target any class you want or use more granular filter criteria than is offered by the portal.

  • The ability to migrate your collection configuration from one SCOM instance to another. OMS doesn’t currently allow you to export custom configuration.

  • The ability to do bulk edits through the myriad of tools and editors available for SCOM management packs (it’s much easier to add 50 collection rules to an existing SCOM MP using a simple RegEx find/replace than it is to hand enter them into the OMS SaaS portal).

Digging into the Management Packs automatically loaded into a connected SCOM instance from OMS, we find that there are quite a few. A lot of them still bear the old “System Center Advisor” filenames and display strings from before Advisor got absorbed into OMS, but the IPs also add in a bunch of new MPs that include “IntelligencePacks” in the IDs making them easier to filter by. Many of the type definitions are  stored in a pack named (unsurprisingly) Microsoft.IntelligencePacks.Types.

Here is the Event Collection Write Action Module Type Definition:

Event Collector Type

Now let’s take a look at one of the custom event collection rules I created in the OMS portal to grab all Application log events. These rules are contained in the Microsoft.IntelligencePack.LogManagement.Collection MP:

Application Log Collection Rule

We can see that the event collection rule for OMS looks an awful lot like a normal event collection rule in SCOM. The ID is automatically generated according to a naming convention that OMS keeps track of  which is Microsoft.IntelligencePack.LogManagement.Collection.EventLog. followed by a unique ID string to identify to OMS each specific rule. The only real difference between this rule and a standard event collection rule is the write action, which is the custom write action we saw defined in the Types pack that’s designed to write the event to OMS instead of to the SCOM Data Warehouse. So all you need to do to create your own custom event data collection rule in SCOM is add a reference to the Type MP to your custom MP like:

<Reference Alias="IPTypes"> <ID>Microsoft.IntelligencePacks.Types</ID> <Version>7.0.9014.0</Version> <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> </Reference>

…and then either replace the write action in a standard event collection rule with the following, or add it as an additional write action (you can actually collect to both databases using a single rule):

<WriteAction ID="HttpWA" TypeID="IPTypes!Microsoft.SystemCenter.CollectCloudGenericEvent" />

Now granted, OMS lets you create your own custom event log collection rules using the OMS portal, but at the moment the level of customization and filtering available in the OMS portal is pretty limited. You can specify the name of the log and you can select any combination of three event levels (ERROR, WARNING, and INFOMRATION). You can modify the collection rules to filter them down based on any additional criteria you can create using a standard SCOM management pack. In large enterprises, this can help keep your OMS consumption costs down by leaving out events that you know you do not need to collect.

If we look at Performance Data next, we see that there are two custom Write Action types that are of interest to us:

Perf Data Collector Type

…and…

Perf Agg Data Collector Type

…which are the Write Action Modules used for collecting custom performance data and the aggregates for that performance data, respectively. If we take a look at some of the performance collection rules that use these types, we can see how we can use them ourselves. Surprisingly, in the current iteration of OMS they get stored in the Microsoft.IntelligencePack.LogManagement.Collection MP along with the event log collection rules. Here’s an example of the normal collection rule generated in SCOM by adding a rule in OMS:

Perf Data Collector Rule

And just like what we saw with the Event Collection rules, the only difference between this rule and a normal SCOM Performance Collection rule is that instead of the write action to write the data to the Operations Manager DB or DW, we have a “write to cloud” Write Action. So all we need to do in order to add OMS performance collection to existing performance rules is add a reference to the Types MP:

<Reference Alias="IPTypes"> <ID>Microsoft.IntelligencePacks.Types</ID> <Version>7.0.9014.0</Version> <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> </Reference>

And then add the custom write action to the Write Actions section of any of our existing collection rules. Like with the Event Collection rules, we can use multiple write actions so a single rule is capable of writing to Operations Manager database, the data warehouse, and OMS.

<WriteAction ID="WriteToCloud" TypeID="IPTypes!Microsoft.SystemCenter.CollectCloudPerformanceData_PerfIP" />

Now in addition to the standard collection rule we also have an aggregate collection rule that looks like this:

App Log Agg Collection Rule

This rule looks almost exactly like the previous collection rule, except for two big differences. One, is that it uses a different write action:

<ConditionDetection ID="Aggregator" TypeID="IPPerfCollection!Microsoft.IntelligencePacks.Performance.PerformanceAggregator">

…and there is an additional Condition Detection that wasn’t present in the standard collection rule for the aggregation:

<ConditionDetection ID="Aggregator" TypeID="IPPerfCollection!Microsoft.IntelligencePacks.Performance.PerformanceAggregator"> <AggregationIntervalInMinutes>30</AggregationIntervalInMinutes> <SamplingIntervalSeconds>10</SamplingIntervalSeconds> </ConditionDetection>

Changing the value for AggregationIntervalInMinutes allows you to change the aggregation interval, which is something that you cannot do in the OMS portal. Otherwise, the native Custom Performance Collection feature of OMS is pretty flexible and allows you to use any Object/Counter/Instance combination you want. However, if your organization already uses SCOM there’s a good chance that you already have a set of custom SCOM MPs that you use for performance data collection. Adding a single write action to these existing rules and creating an additional optional aggregation rule for 100 pre-existing rules is likely easier for an experienced SCOM author than hand-entering 100 custom performance collection rules into the OMS portal. The other benefits from doing it this way include the ability to bulk edit (changing the threshold for all the counters, for example, would be a simple find/replace instead manually changing each rule) and the ability to export this configuration. OMS lets you export data, but not configuration. Any time you spend hand-entering data into the OMS portal would have to be repeated for any other OMS workspace you want to use that configuration in. A custom SCOM MP, however, can be put into source control and re-used in as many different environments as you like.

Note: When making modifications to any rules, do not make changes to the unsealed OMS managed MPs in SCOM. While these changes probably won’t break anything, OMS is the source of record for the content of those MPs. If you make a change in SCOM, it will be disconnected from the OMS config and will likely be overwritten the next time you make a change in OMS.

One last thing. Observant readers may have noticed that every rule I posted is disabled by default. OMS does this for every custom rule, and then enables the rules through the use of an override that’s contained within the same unsealed management pack so this is normal. This is presumably because adjusting an override to enable/disable something is generally considered a “lighter” touch than editing a rule directly, although I don’t see any options to disable any of the collection rules (only delete them).

Azure Resource Tagging Best Practices

Taggs.png

We frequently are asked out in the field to help customers understand how tags should be used. Many organizations are worried that tags, being inherently unstructured, will cause confusion. As a result, we’ve come up with a structured way of thinking of and applying tags across your subscription. You can use Azure Resource Manager Policy to enforce this tagging philosophy.

What is a tag?

Tags provide a way to logically organize resources with properties that you define.  Tags can be applied to resource groups or resources directly. Tags can then be used to select resources or resource groups from the console, web portal, PowerShell, or the API. Tags are also particularly useful when you need to organize resource for billing or management.

Key Info

  • You can only apply tags to resources that support Resource Manager operations
    • VMs, Virtual Networks and Storage created through the classic deployment model must be re-deployed through Resource Manager to support tagging
      • A good way around this is to tage the resource group they belong too instead.
    • All ARM resources support tagging
  • Each resource or resource group can have a maximum of 15 tags.
  • Tags are key/value pairs, name is limited to 512 characters, value is limited to 256 characters
  • Tags are free-form text so consistent correct spelling is very important
  • Tags defined on Resource Groups exist only on the group object and do not flow down to the resources under them
    • Through the relationship you can easily find resource by filtering by tagged resource group
    • We recommend keeping the tags to the resource group unless they are resource specific.
  • Each tag is automatically added to the subscription-wide taxonomy
    • Application or resource specific tags will "pollute" the tag list for the entire subscription.

Best Practices

We think its important for a customer to leverage at least some of the tags in a structured way. Given the limit on number of tags we recommend tagging at the group level. We don't feel there is currently a need to set them on the resources as you can easily trace down from the Resource Group.

Primarily we recommend a Service/Application/Environment hierarchy along with an environment type, and a billing identifier be reflected in the tags. To do this we recommend spending the time to define an Application hierarchy that spans everything in your subscription. This hierarchy is a key component to both management and billing and allows you to organize the resource groups logically. Its also important that this hierarchy contain additional metadata about the application, like its importance to the organization and the contact in case there is some issue. By storing this outside the tag, say in a traditional CMDB structure, you cut down on the number of tags you need to use and the complexity of tag enforcement, and reduce the risk of tag drift.

Once a taxonomy is agreed on, create the tags for Service/Application and Environment and set them on each Resource Group. Then set a tag for Environment Type to Dev, Test, Production to allow you to target all Dev, all test, or all production later in policy and thru automation.

For the billing identifier, we recommend some type of internal charge code or order number that corresponds to a General Ledger (GL) line item to bill that resource group's usage to.  Just these few tags would enable you to determine billing usage for VMs running in production for the finance department.

There are several ways to retrieve billing information and the corresponding tags. More information can be found here: https://azure.microsoft.com/en-us/documentation/articles/resource-group-using-tags/#tags-and-billing

Recommended Tags

To be prescriptive, we recommend these tags be set on each resource group:

 

Tag Name
State
Description
Tag Value
Example
AppTaxonomy Required Provides information on who owns the resource group and what purpose it serves within their application Org\App\Environment

USOPS\Finance\Payroll\PayrollTestEnv1

 

MaintenanceWindow Optional Provides a window during which patch and other impacting maintenance can be performed Window in UTC "day:hour:minute-day:hour:minute" Tue:04:00-Tue:04:30
EnvironmentType Required Provides information on what the resource group is used for (useful for maintenance, policy enforcement, chargeback, etc.) Dev, Test, UAT, Prod Test
BillingIdentifier Required Provides a charge code or cost center to attribute the bill for the resources too Costcenter 34821
ExpirationDate Optional Provides a date when the environment is expected to be removed by so that reporting can be done to confirm if an environment is still needed Expiration Date in UTC 2016-06-15T00:0

 

Over time we’ll have additional posts, scripts and other items that will build on this tagging structure.

New! You may find this post with a PowerShell script useful for reporting on resource group tags