Vulnerability Management in Microsoft Azure

22 February 2024

This article focuses on how you can use the native tools in Azure, and how an organization can establish a basic process for vulnerability management regarding virtual machines for vulnerabilities that can’t be fixed by just doing the standard automatic update management in Azure.

Vulnerability management is the process of detecting, assessing, mitigating, and reporting on security vulnerabilities that exist within an organization’s technical platform as well as within the software they develop. Cloud Service Providers are responsible for doing this for resources that are created by using a Platform-as-a-Service (or serverless) offering. Cloud customers, however, are responsible for vulnerability management if they use an Infrastructure-as-a-Service offering. In the case of Azure, this means that those who use Azure virtual machines (VMs) have to patch those by themselves.

Azure includes tools like Azure Update Manager, Azure Defender for Cloud (formerly known as Azure Security Center), Azure Resource Graph, and Log Analytics. The Defender security portal also contains a vulnerability management feature, but it requires a license to activate and use. These tools are commonly used to ensure that VMs have the latest security updates, so that an organization can minimize the window of opportunity for attackers. Most of the tools mentioned are available in the Azure portal, but in some cases, you might want to extract information using their corresponding Azure API. Either to automate, or to create a holistic overview to see what your security posture really is.

Governance (system management) of solutions based on virtual machines (VMs) includes change management, which consists of performing minor updates and major upgrades. A minor update would be to change the software version from 1.1.0 to version 1.1.1, while a major upgrade would be a change from version 1.1 to version 2.0. A major upgrade represents a significantly greater risk, requiring more testing and verification to ensure the change doesn’t have a negative impact on an application’s functionality or availability. This is required to be done manually as new vulnerabilities are continuously discovered in all software. The challenge here is that if a high risk vulnerability is discovered in software, and a major upgrade is required to solve it, then that might take a long time for a vendor to fix. Minor updates can usually be remediated in a shorter time.

Automated Patch Management vs. Manual Patching

The standard approach to Azure vulnerability management is to use the Azure Update Manager to schedule Windows updates and security patches to be installed automatically. Microsoft releases the majority of these updates on the second Tuesday of each month, commonly known as “patch Tuesday”. Organizations usually create “patch groups” based on separate schedules. One patch group for non-production servers, and another for servers in production environments for example. Using different schedules allows organizations to install patches in the test environment first, and then to production if everything went well during testing. Some also want to wait a few days to initiate patching, just to avoid installing a patch that has caused technical issues for other organizations. Performing automatic updates this way ensures that security patches are installed as soon as possible, and it’s the easiest way to ensure that the majority of new vulnerabilities are remediated (fixed) automatically each month.

Some software components still require manual patching since automatically patching them can cause unforeseen technical issues for applications that depend on them. This especially concerns frameworks like .NET, .NET Core and Node.js. In practice, frameworks can’t be automatically patched due to the large number of software dependencies they have. For example, upgrading these frameworks automatically could cause a Line-of-Business (LOB) application to fail if that application’s vendor hasn’t ensured that it’s possible to upgrade without causing issues. Since third-party software often requires manual updates, this means that it can lead to software becoming “unsupported” if it’s never updated. This is referred to as End-Of-Support software, and it usually means that the software vendor ceases to provide support and security updates for it.

As a result of manual patching being required, organizations have to track which changes need to be done manually each month. This also includes having to ensure the patch actually remediates the vulnerability it’s supposed to fix. In cases where the vulnerability can’t be immediately remediated, it’s necessary to find ways to mitigate the risk in other ways to make an exploit of that vulnerability less likely. Mitigation is implementing some form of compensating controls. It can be to block Internet access to and from a system, restricting users from accessing an application without MFA, or to place VMs in a dedicated subnet with a Network Security Group (NSG) attached to it.

Security Controls and Azure Security

Security for Azure environments relies on the principles of “Zero Trust”, “Assume Breach”, and “Defense in Depth”. Defense in depth is based on the idea of using multiple layers of security controls to protect an IT system. The National Security Agency (NSA) originally conceived this layering tactic based on the military strategy of the same name. The purpose of this concept is to provide redundancy in the event an individual security control fails. Redundancy in this case could be having provided security awareness training for personnel, physical security, well-defined processes, and other technical countermeasures. Technical countermeasures are security controls that address different aspects of IT security such as authentication, authorization, encryption, network access control, integrity management, vulnerability management, malware protection, and security monitoring. This provides a layered defense according to this principle.

Applying the defense in depth principle to Azure includes using several security related services, configuring resources according to security benchmarks, and performing security monitoring. This includes configuring security related options in Entra ID to protect user accounts (IAM), implementing network security controls (Azure Firewall, App Gateway with WAF, Azure FrontDoor), configuring additional encryption options based on Customer-Managed Keys stored in a Key Vault, hardening resource configurations, security even monitoring using Sentinel, or applying software security tools to Azure DevOps (such as SCA, SAST, DAST) to ensure source code doesn’t contain different types of vulnerabilities. Hardening operating systems such as Windows using Group Policy is also necessary, but also hardening services such as IIS, SQL Server, etc. A special case worth mentioning is containers which are deployed based on images, such as Nginx for example. Many customize these images to deploy both operating system and services (web servers) into a container app. These images include software that contains vulnerabilities also, which means that the images must be updated so that new and secure containers can be deployed to replace the old and insecure.

Defining a Process for Vulnerability Management

Implementing a security control starts with defining a set of requirements that the organization must adhere to, providing a set of tools that can be used by the staff, that the tools make automation possible, and that the automation makes it possible to review and track Key Performance Indicators or metrics. The last part is necessary to create reports which describe how effective a security control is and how well it’s functioning as a process.

Defining requirements is usually done in a information security policy, standard, or guideline. Requirements should be applicable to both on-premise systems, and to systems running on services provided by a Cloud Service Provider.

Subscribe to our newsletter here, and you’ll receive free access to our Vulnerability Management guideline which includes detailed requirements:

A process can be described as being people using tools to process information with the purpose of achieving a goal. A process for managing vulnerabilities on VMs can consist of the following sub-processes:

Collect information about installed software
Collect information on vulnerabilities present in software
Prioritize security risks based on criticality
Remediate risks that can be fixed
Mitigate risks by implementing compensating controls
Create reports based on current state and trend
Manage residual risks

These activities are a major part of working with security operations when an organization wants to minimize the risks they are facing. The main goal here is to prevent security incidents that might be caused by hackers exploiting vulnerabilities in software.

Collecting Software Information

Vulnerability management requires you to know what software, and which version, you have installed. This information is particularly important for LOB applications, and any software that’s exposed to the Internet. Many organizations already use a Configuration Management Database (CMDB) that collects information and stores it in a central repository as Configurable Items (CIs).

Another way of collecting information about software that has been on VM’s in Azure is to enable the “Inventory” feature in the Azure portal on VMs. Go to the “Automation Account” you’re using to collect the information and select “Inventory” in the “Configuration Management”. Then click on the “Software” tab to see what’s installed.

This table lists the installed software grouped by name and version, including high-level information like name, version, publisher, last refreshed time, and the number of VMs with that software. Clicking on a row displays details like properties and VM names with the software. You can also search for a particular application or application group by name, version, and publisher.

Using Log Analytics as a Scanning Tool

The information presented in the “Inventory” feature is stored in a Log Analytics workspace in a log table called “ConfigurationData”. This means that it’s also possible to search through the data using the Kusto Query Language (KDL). A simple KDL query could look like this:

ConfigurationData
| where ConfigDataType  == "Software"
| where SoftwareType  == "Application"
| where TimeGenerated between (startofday(ago(1d)) ..now())
| distinct Computer, SoftwareName, Publisher, CurrentVersion, _ResourceId

The log data is also available if you query the Azure Log Analytics API. The URI for the API call is:

“https://api.loganalytics.io/v1/workspaces/{yourLogAnalyticsWorkspaceId}/query”.

Log Analytic workspaces have a unique “Workspace ID”, and you have to have authorization to query them. This API is also based on POST requests, so the KDL query is sent to the API as UTF8 encoded “application/json” content. The response is formatted as a table, so it contains a set of columns (basically just the column names), and then all the rows which contain the actual data.

This means that you can query all Log Analytics workspaces to find out which workspaces have the ConfigurationData log table and then extract the information you need.

Collecting Vulnerability Information

Vulnerability management requires you to quickly act on issues that are continually changing in an effort to minimize an attacker’s window of opportunity. The default way of finding vulnerabilities on VMs is to use the Azure portal. In Azure, vulnerability scanning is performed by Microsoft Defender for Cloud.

Third-party vulnerability scanning tools can also be used to perform similar vulnerability assessments. Usually also for network devices, web application source code, and other components in your technical platform. Many larger organizations use a vulnerability scanner instead of the built-in vulnerability management tools available in Azure. Reviewing scan results at least every month is a common task in those situations, but newly detected vulnerabilities should be reviewed each week when possible if you’re using a 3rd party vulnerability management solution.

Defender for Cloud Vulnerability Assessments

Start by reviewing the environment settings in Defender, and the settings of each subscription, to verify that the correct defender plans are enabled for the subscriptions you want to protect. Enabling the Microsoft Defender for Servers CWP (Cloud Workload Protection) will provide the necessary information as it’s the cloud native solution for performing vulnerability assessments. Consider using Azure Policy to configure the “Configure Microsoft Defender for Cloud plans” policy to ensure that all future subscriptions are protected by default.

After this is done, follow the recommendations provided by Defender in the “Remediate vulnerabilities” recommendations group. There you’ll see the two main recommendations which are “Machines should have a vulnerability assessment solution” and “Machines should have vulnerability findings resolved”. There might be “unhealthy resources” there which should be investigated. Make an “exemption” for a resource for which the recommendation isn’t applicable for some reason. This should help with increasing the Secure Score percentages, so that you can showcase improvements being made. All while keeping track of, and being transparent about, the exemptions you’ve made.

Other Vulnerability Scanning Tools in Azure

The Azure Resource Graph Explorer is another way of identifying vulnerabilities in the Azure portal. This option involves querying the “SecurityResources” logs and looking at the specific subassessment for VM vulnerabilities. It’s possible to download the result as a CSV file so that you can do some final touches with it in Excel.

Here is a query that generates a result containing the current vulnerabilities:

securityresources 
| where type == "microsoft.security/assessments"
| where * contains "vulnerability findings"
| summarize by assessmentKey=name 
 | join kind=inner (
    securityresources
     | where type == "microsoft.security/assessments/subassessments"
     | extend assessmentKey = extract(".*assessments/(.+?)/.*",1,  id)
 ) on assessmentKey
| project assessmentKey, subassessmentKey=name, id, parse_json(properties), resourceGroup, subscriptionId, tenantId
| extend description = properties.description,
         displayName = properties.displayName,
         resourceId = properties.resourceDetails.id,
         resourceSource = properties.resourceDetails.source,
         category = properties.category,
         severity = properties.status.severity,
         code = properties.status.code,
         timeGenerated = properties.timeGenerated,
         remediation = properties.remediation,
         impact = properties.impact,
         vulnId = properties.id,
         additionalData = properties.additionalData

You can also query the Azure Resource Graph API to find information on vulnerabilities in your Azure platform. This is also a POST request and with this API you have to supply content that includes the query, but also a list of subscription IDs for which you want to extract the vulnerability data. The URI for this API is:

https://management.azure.com/providers/Microsoft.ResourceGraph/resources

In addition, you can work with vulnerabilities using the Microsoft Defender portal which is available at: https://security.microsoft.com

Important to note that this portal is not the same as the Defender for Cloud service in the Azure portal. Microsoft Defender portal includes a Vulnerability Management feature, which requires the “Microsoft Defender Vulnerability Management Add-on” feature. This feature is available with the “Enterprise Mobility + Security E5” and “Microsoft Defender Vulnerability Management” licenses. This means that you might need to buy additional licenses to use it.

Prioritizing Risks

You should also rate newly discovered vulnerabilities according to risk, so you can prioritize their remediation. Common methods for meeting this objective include using your scanning tool’s own risk ratings or a dedicated risk scoring program like Common Vulnerability Scoring System (CVSS). Vulnerabilities found in applications or services that are exposed to the Internet should have the highest priority, as the probability is highest that they’re attacked. It’s also important to analyze if a vulnerability can be exploited remotely over a network, as some might only be possible to exploit by someone who has logged on locally on a VM. Also prioritize risks in production environments, and any environment that contains sensitive production data that is used for testing purposes.

An example of the importance of properly managing vulnerabilities in Internet-facing applications was the zero-day vulnerability in the Java-based logging utility Log4j. Information about this threat was published on December 9 in 2021, but the vulnerability had been exploited by hackers since 2013. The vulnerability (dubbed Log4Shell) allowed hackers to remotely execute code on services that used vulnerable versions of this software component internally. Apache assigned Log4Shell the maximum CVSS severity rating of 10, due to its potential for exploiting millions of servers. Cybersecurity firm Tenable characterized this vulnerability as “the single biggest, most critical vulnerability of the last decade”, and Lunasec described Log4Shell as “a design failure of catastrophic proportions”.

Remediating Risks

Remediation is the elimination of a potential threat to a system, often by patching software. Another method of manually remediating server vulnerabilities includes uninstalling the affected software. This approach is only an option if the organization can tolerate the loss of that software. More commonly, a new major version of software may introduce a vulnerability that doesn’t exist in older versions. In these cases, a common solution is rolling back to the latest version that doesn’t have a vulnerability. However, you must also consider the other updates that might be available before deciding to roll back. In some cases, an update hasn’t been applied in a long time, or the vendor doesn’t support an upgrade path. When this happens, it may be necessary to roll back to the latest major version of the software.

Locating software to uninstall requires you to identify installed software by configuring the “Inventory” service for VMs in Azure. This data is then collected to a Log Analytics Workspace, where you can query the “configurationData” log to review the installed software, their updates, and additional information on services. In addition, you need to initiate a dialog with LOB application vendors to determine which updates they support. This process allows you to update the application to a new version or patch the existing version.

Mitigating Risks

Risk mitigation is the process of either decreasing the probability that an incident could occur, or reducing the potential damage of it if it does occur. The goal is to ensure that the organization considers a risk acceptable based on a cost-benefit analysis.

Zero-day vulnerabilities similar to Log4Shell might not have an immediate means of true remediation if an updated version of a software isn’t available. You might have to wait until the vendor releases a patch before you can truly fix such a vulnerability, but you will be able to implement workaround measures in the meantime. Mitigation generally involves making the vulnerable application inaccessible through various means like blocking access to it from the Internet, blocking access to it from the internal network, or making it inaccessible by simply shutting it down. Some vulnerabilities take time to fix, so your immediate priority in mitigation should be to prevent hackers from exploiting the vulnerability. Despite that it might affect your customers, employees, or the organization as a whole which is why communication is a huge part of security incident response processes.

In the case of exploits not based on zero-day vulnerabilities, users may be able to perform their own risk mitigation to a degree. In Azure, for example, users can protect their user accounts as identity is the primary security perimeter when using cloud services. The most important tools include resetting passwords, using Multi-Factor Authentication (MFA), enabling Entra ID conditional access policies, or using physical FIDO2 compatible key storage to strengthen MFA security. Some privileged users should also have to use Privileged Identity Management (PIM), depending on if they have a role such as global administrator or owner in Azure.

Azure Vulnerability Scanning and Reporting

The primary purpose of reporting regarding vulnerability management is to ensure the responsible parties are remediating vulnerabilities over time. These reports should provide details about current vulnerabilities, such as whether they affect critical LOB systems or Internet-facing systems. Reports should also provide information such as the type of vulnerability, its severity, and the software versions that are affected. In addition, vulnerability reports should include aggregate information, such as the total number of vulnerabilities that have been detected and how many VMs they affect. This information is used to decide what level risk the organization is willing to accept, which is called “risk appetite”.

Vulnerability reports should always be based on metrics, so it’s possible to track the performance of the vulnerability management security control over time. This practice is necessary to reduce the number of existing vulnerabilities, and to detect new vulnerabilities in a timely manner.

Common KPIs in vulnerability reports include Time to Detect (TTD), which is the average time between a vulnerability being disclosed and its detection by the affected organization. Security teams should strive to detect vulnerabilities before they’re exploited, which generally means the TTD should be no more than a few days, rather than weeks or even months. The Time to Resolve (TTR) KPI measures the average time needed to mitigate the vulnerability once it’s detected, which determines the extent to which an attack can escalate. The Average Window of Exposure (AWE) is the average time between the public disclosure of a vulnerability and the time when the affected systems are completely remediated.

Risk Management

Many vulnerabilities can’t be immediately remediated, requiring either a risk exemption or a formal risk acceptance. A risk exemption should only be valid for a limited duration of time, based on what an owner or vendor can commit to. Assume for this example that remediation requires the application vendor to develop and test a new version of software, which the vendor agrees to do in six months. In this case, the exemption period should be at least six months, triggering a review at the end of that period to determine if the vulnerability has been fixed. If it hasn’t, it may be necessary to escalate the issue to a higher level of authority within your organization.

Risk acceptance might be the only option when an organization uses old End-of-Life LOB applications that a vendor no longer provides support or updates for. In those cases, the risk must be formally accepted by either a manager (CEO, CIO), a business function (Governance, Risk, and Compliance), or similar. Risk acceptance is done by an authority that can make that decision and who is accountable if a security incident happens because the vulnerability is still there. It’s usually required to present information about the business requirements, the security issues, and information about if/how the risk has been mitigated.

Some vulnerabilities may represent an acceptable risk, such as those in “End-of-Support” or “End-of-Life” software. In these cases, the vendor may no longer support the software with updates, typically because the vendor is trying to transition its customers to a new product or service. For example, vendors that have provided on-premise Enterprise applications often develop a SaaS version. In other cases, a vendor may stop supporting its software if it goes out of business or is acquired by another company. It may still be possible to mitigate a vulnerability to the point that it’s an acceptable risk, even while continuing to use the affected system.

Conclusions

Accessing information on installed software for the purpose of vulnerability management is often challenging, largely due to the amount of software typically installed on a VM, and the quantity of data needed to identify and keep track of vulnerabilities. It does require technical knowledge to collect and analyze this information if the organization doesn’t have a proper Content Management Database (CMDB).

Many organizations have difficulties getting the necessary insights into the current state of VM vulnerabilities. Lacking information makes it impossible to determine what should be done. Many security incidents are caused by hackers exploiting old, well-known, vulnerabilities that were present in the organization technical platform without anyone being aware of it. In some cases, an organization might never be able to ascertain which vulnerability made a security breach possible.

Even though most software can be automatically updated, manual updates are also necessary. This requires having to manage, monitor, and track some vulnerabilities over extended periods of time. Zero-day exploits are also a major challenge in vulnerability management, especially as they tend to be detected or exploited out of business hours when key staff members might be sick or on vacation. This ties into security incident management processes.

The solution to these challenges is to develop a mature process for managing vulnerabilities. Key components of such a model include clearly stated requirements, easy access to the necessary information, skilled and well-trained staff, assignment of responsibilities in addressing each vulnerability, and having dialogs with vendors. Having the right tools makes it possible to generate reports automatically, to monitor trends to see if the number of high risk vulnerabilities in the technical platform is decreasing, and to quickly detect and analyze new vulnerabilities. It also helps when handling a security incident to determine if a zero-day exploit vulnerability is present in the technical platform. Still, achieving a mature process requires a combination of all this. It takes time to put in place and to ensure that it’s truly effective in detecting as well as addressing security risks.

Feel free to use the requirements in our Vulnerability Management guideline as a starting point, or use it to review your own requirements. Subscribe to our newsletter here to get access to it, and also to other guidelines that we provide to our subscribers:

Additional Reading on Microsoft Learn

Read more on the topics mentioned in this article by visiting these links: