Microsoft Azure FinOps and Best Practices
15 January 2024
Financial operations, commonly known as FinOps, combines traditional principles of cloud financial management with cloud computing, cloud infrastructure, and cloud operations. Its goal is to provide organizations with a better understanding of their cloud spending, thus helping them make better decisions on allocating costs and to keep dynamic costs under control. FinOps is key when using cloud services to increase business value.
This article discusses the alignment of FinOps with Azure, the FinOps Framework specifically, Microsoft’s best practices for reducing costs in Azure, and also provides insights into cost reduction based on news from Microsoft Ignite 2023.
Getting Started with FinOps in Azure
Cost management is a crucial new IT governance control that needs to be established when using cloud services. Cost has become an important non-functional requirement that architects need to consider, along with more established ones like performance, availability, reliability, and security.
FinOps in Azure largely correlates to using Azure Cost Management (ACM) to identify cost-saving opportunities, track and analyze cloud spending, and allocate costs to specific teams, products, and projects. This requires close collaboration between organizational functions like business, finance, and IT to establish and enforce the policies needed to track, analyze and optimize cloud costs. In addition, organizations use FinOps to align these expenditures with their business goals, while balancing operations costs with performance. This practice helps companies achieve those goals without spending more than necessary on cloud resources.
The FinOps Framework
FinOps helps organizations adopt the mindset and technical solutions that will best achieve financial gains through the use of a Cloud Service Provider such as Microsoft Azure. These benefits include reduced cloud expenses, improved accountability, and greater agility in making decisions. The FinOps Framework itself is a framework, published by the FinOps Foundation, which provides an operating model for how to establish and succeed with FinOps. It describes FinOps principles and organization can adhere to, and which FinOps capabilities an organization can establish.
Microsoft joined the FinOps Foundation in February 2023 and provides information on best practices and additional guidance regarding the framework. See the “Microsoft FinOps” reference at the end of this article if you want to read more about Microsoft’s approach to FinOps.
Azure Cost Management
ACM is a free solution that is available in the Azure portal. It’s the main service used for Cost Management, billing, and to analyze current Azure spending. ACM provides customers with cost visibility, information about cost trends, about their overall costs across the whole cloud environment in its’ dashboard which means all Azure subscriptions and cost data for all resources. ACM also includes a possibility to export Excel or CSV reports that also offers insights into an organization’s use of AWS.
ACM continuously monitors a customers’ resources and provides ongoing cost related information about them. ACM is also integrated with Azure Advisor, which in turn generates cost recommendations based on usage patterns. The cost related recommendations in Azure Advisor should always be reviewed.
Cost Management Benefits
The two key benefits of ACM are improved visibility over cloud spending and mapping those costs to specific departments or initiatives. For example, ACM helps users analyze their previous cloud usage and resulting expenses, allowing them to predict future expenses. Users can view trends on a daily, monthly or annual scale to identify opportunities for cost savings. This data shows the actual resource units that determine the bill because it comes directly from Azure.
ACM maps cloud costs to departments or initiatives by using billing scopes, such that each department or initiative is a separate cost entity that pays for Azure services. However, customers can also create a cost model that structures resources based on tags that each cost entity applies to its resources. Once an organization defines its billing scope, and cost related tags, each business entity can use ACM to analyze the costs associated with its budget. Azure Policy is commonly used to set restrictions on the use of resources for specific projects, teams, and users as a way to mitigate cost related risks.
FinOps and DevOps
FinOps should also be an integral part of DevOps. This means that you should continuously reevaluate costs, configurations and which Azure offers, plans, tiers, and SKUs are used. This is also the only way to quickly detect an anomaly and remediate the issue. Quite a few capabilities in the FinOps Framework rely on them being part of DevOps processes.
Azure FinOps Guide: Cost Management Guideline
We’ve put together a Cost Management guideline (available in PDF format) based on the FinOps Framework that Microsoft has adopted. The guideline is a detailed example of how an organization could enforce the necessary requirements to implement FinOps tools and practices. Our FinOps guide mentions the best Microsoft Azure services to use (ACM, Azure Monitor, Advisor, Policy), different analysis tools (Pricing & TCO Calculators), additional tools, which tags to assign to resources, and how costs can be kept down as efficiently as possible.
This guide goes into great detail regarding how to implement the cost optimization strategies that are mentioned in this article. We hope that it will help you get started with FinOps if you haven’t already done so, or at least provide some interesting information that you might not be aware of. You will find it valuable if everyone in your organization isn’t working with FinOps in very similar ways.
Subscribe to our newsletter here, and you’ll receive free access to our Cost Management guideline:
Cost Optimization and Cost Savings
IT teams have many options for deploying assets in Azure, including which services to use, and different configuration options to select. These decisions have a profound effect on costs, so it’s important to choose only those resource types, resource features, and capacity, that will provide a positive return on investment (ROI). Additional options for minimizing Azure costs include using Microsoft’s cost related calculators, and implementing automation.
Options to Limit Cloud Expenditure
Options for limiting spending are to use Azure Spot virtual machines (VMs), serverless technologies, Static Web Apps, Azure DevTest Labs, reserved instances, and commit to savings plans. Keeping up-to-date with Azure related news will help you discover new ways to reduce spening.
Azure Spot VMs
Azure Spot VMs can significantly reduce costs by utilizing Azure’s unused capacity. However, this savings comes with the condition that Azure will reclaim the capacity when needed, which could result in Spot instances getting evicted. Users can set the eviction policy for their Spot VMs to Deallocate or Delete. The Deallocate option allows for the possibility of starting the VM at a later time. The Delete option means the VM and its underlying disks are deleted once it’s evicted.
Using spot VMs, instead of Azure virtual machines, is best for non-critical workloads that can handle disruption, where cost efficiency has a higher priority than uninterrupted processing. These workloads typically include batch processing jobs, development and testing environments, and large workloads.
The Logic Apps (Consumption mode) and Azure Functions services offer serverless computing, which is often an effective means of reducing Azure costs. Software deployed in this way runs on demand, as determined by user-defined triggers. Customers only pay for this service when the software runs, making it most suitable for compute-intensive workloads that run intermittently.
Assume for this example that you have an application that resizes images once users upload them to a website. Such an application requires a lot of compute power, but it only needs to run periodically. If this application is deployed as a serverless function, it would only need to run when images were available to resize. This approach would cost less than hosting the service on a VM, which typically has higher CPU resource requirements. In this scenario, the VM would incur its operating cost, whether it was needed or not.
Migrating an application from Azure VMs to containers can cut cloud-hosting costs under certain circumstances. Containers require fewer resources to maintain virtualization than VMs, allowing a server to host more containers.
Assume that you currently have multiple servers hosting WordPress, such that each server runs on a different VM. Deploying those servers as containers instead of VMs by using Azure Kubernetes Service (AKS) typically allows you to consolidate three or four VMs into a single container. This approach results in the distribution of WordPress instances between containers, thus cutting costs significantly. The cost of AKS per host server is the same as that of standard Azure VM pricing, but the reduction in servers can cut costs by up to 75 percent. However, AKS also charges an additional fee per hour if you want guaranteed uptime.
Azure DevTest Labs
Azure DevTest Labs is a service that allows users to easily create, use and manage VMs in an Infrastructure-as-a-Service (IaaS) or Platform-as-a-Service (PaaS) environment. It’s specifically designed for development and testing environments, as it offers pre-configured artifacts, bases, and templates for use by Azure Resource Managers. DevTest Labs promotes cost control, operational efficiency and consistent behavior of VMs, which is particularly useful for development, test and training labs.
For example, users can create policies to limit the size and number of VMs, schedule automatic startups and shutdowns, and monitor costs. They can also customize their operating environments, including the integration of various tools. DevTest Labs users can also create on-demand environments by integrating deployment pipelines into their Azure platform.
On-demand capacity reservations can be combined with Azure Reserved VM Instances (RIs) to significantly reduce costs. The Reserved option allows users to reserve VM instances for a specified period of time between one and three years. Microsoft reports that the pricing for this deployment is as low as 28 percent of on-demand pricing.
Region Selection to Optimize Costs
A cloud region is a group of data centers within a defined geographic area, thus determining the physical location where your workloads are hosted. The cost of running Azure can be highly dependent on cloud region, such that North American regions are usually the least expensive. Selecting the right Azure region can therefore save a significant amount of money, but it can also affect performance. In particular, a region that’s further from your end users is more likely to suffer from network issues like packet loss and latency. As a result, a cheaper Azure region may not be worth the savings if it results in workloads that are hosted thousands of miles from their end users.
FinOps Related Tools
Azure provides the following tools to forecast and optimize cloud spending:
Cost Analysis: This feature can help administrators understand an Azure workloads’ actual cost once it’s running. It also predicts future spending based on current configurations. Consider using the “cm-resource-parent” tag since it helps with analysis. Assigning that tag to a resource, and using a resource ID as the value, will group those together in the GUI. By doing this you can group sub resources together such as a VM disk, a storage account used for logs, so they appear ”under” the virtual machine that they belong to. Instead of them all appearing on separate rows, which makes it harder to see the total cost of the virtual machine.
Azure Advisor: Advisor makes recommendations regarding cost-optimization by focusing on reducing the cost of VM instances. It isn’t useful for optimizing the cost of other resource types like databases.
Budgets: Administrators can configure budgets in the Azure portal to enforce organizational goals on cloud spending. This option allows administrators to set spending limits for user groups as well as individual users, which generates an alert when thresholds are exceeded.
Azure Policy: This can enforce rules regarding the use of Azure resources, including who can create them and how they have to be tagged. Azure Policy isn’t specifically designed to optimize costs, but it can do so by preventing the development of shadow IT. There’s also other tools such as “Azure-orphan-resources”, or “Start/Stop VMs v2”, which can help administrators identify workloads that are still running after they’re no longer needed. See the references below for additional information about these.
Besides these tools, using a Azure Naming Convention also helps by providing you with information if a resource is used for development, testing, or other non-production purposes. This should make it easier to decide how to deallocate or delete resources to reduce cost.
Other Microsoft Tools
Microsoft also provides these additional tools which are not part of Azure per se:
Azure Pricing Calculator can anticipate the cost of an Azure workload before deploying it. This tool allows administrators to identify cost-saving opportunities by comparing the cost of different configurations and service types.
The Total Cost of Ownership (TCO) Calculator lets you estimate the cost savings you can realize by migrating your workloads to Azure. You can define your workloads regarding servers, databases, storage, and networking to generate a report with estimated costs.
Cost Management Integrations and API’s
It’s also possible to work with 3rd party cost management solutions that use Azures REST APIs, or to use Microsoft Power BI, to perform cost analysis. Some cost-related Azure API’s are however only available to customers who have an Enterprise Account. Despite this,everyone should at least be able to use the following Azure API’s:
- Microsoft.CostManagement/query (POST request)
- Microsoft.Consumption/aggregatedcost (GET request)
- Microsoft.Consumption/budgets (GET request)
- Microsoft.Consumption/tags (GET request)
- Microsoft.Billing/billingPeriods (GET request)
- Microsoft.Billing/invoices (GET request)
Also look into which permissions might be required if you want to use one of these API’s. You should always follow the security principle of “Least Privileges”. Additional information about each of these API’s is available on Microsoft’s “Azure REST API reference” webiste.
Resource Tagging and Cost Management
Tags are key-value metadata that are used to provide additional information about a resource. The establishment of a Resource Tagging guideline should always include information about which tags should be used to help with optimizing costs. Centrally defining requirements for tagging is generally the only way to ensure that important information is provided for resources in Azure. For example, administrators can use tags to identify the user who created a resource or the cost center that a resource belongs to. Admins can also search existing tags in the “Tags” service in Azure, which is an effective method of tracking the resources a workload consumes.
Resource tagging can identify workloads that are strong candidates for elimination to save money. Assume, for example, that you have multiple VMs running in a development and testing environment. You could define a tag as environment:dev and apply it to those VMs. During an organizational holiday, you could then search for resources with this tag and turn off those resources to avoid paying for VMs when they aren’t needed.
Insights from Microsoft Ignite 2023
Microsoft Ignite is an annual event that includes the Book of News for that year. This publication includes announcements on a range of Microsoft topics, including Azure. The 2023 Book of News includes many updates that can help Azure users cut costs.
Azure Cosmos DB has received many updates in 2023 that will help developers deliver applications more efficiently, while also reducing their production cost. These changes include dynamic scaling for each partition or region, which is available as a preview for new users. Dynamic scaling allows users to optimize Azure for scale and cost in cases where partitioning is currently used to scale individual containers.
This capability is most useful for databases with high performance requirements or when Cosmos DB is configured for multiple regions, typically for the purpose of globally distributing data. It provides developers with greater flexibility in cutting costs by scaling database requirements at a more granular level, whether the scaling is based on region or data partition. Dynamic scaling is most beneficial for users with hot partitions in their databases or global operations.
Cosmos DB for MongoDB vCore is generally available to the public, which allows developers to build applications in Azure that are compatible with MongoDB. This option combines native Azure integrations, low total cost of ownership (TCO) and the vCore architecture familiar to many MongoDB developers, which is especially beneficial when building new applications or migrating existing ones. Azure Cosmos DB for MongoDB vCore also includes a free service tier, which helps developers explore this platform at no cost.
Large Language Models
AKS can now run Machine Learning (ML) workloads like large language models (LLMs) at a lower cost and with less manual configuration. The Kubernetes AI toolchain operator automates the deployment of LLMs on AKS across available resources by selecting an infrastructure with the optimal size. This approach allows for easy distribution of inferencing across multiple VMs with a lower GPU count, thus increasing the number of regions in which workloads can operate and reducing their total operating cost. Users can also select preset LLMs with images hosted by AKS, which greatly reduces the time needed to set up inference services.
Windows Autopatch is the only update management solution for Windows enterprise customers as of 2023. This solution serves as a fully automated managed service, which cuts costs by allowing customers to reallocate their resources to areas with greater value. Windows Autopatch is part of Microsoft Intune, allowing it to deliver a high degree of control in managing updates. In addition to Windows, Autopatch also upgrades Microsoft 365, Microsoft Edge and Microsoft Teams. Windows Update for Business will also merge into Autopatch over time.
Azure Data Lake Storage Gen2
Azure Data Lake Storage Gen2 (ADLS Gen2) shortcuts are generally available as of 2023, allowing data engineers to use a live connection with target data when connecting to external data lakes in ADLS Gen2 to data in OneLake. This update allows users to reuse data from ADLS Gen2 without copying it, thus eliminating data problems such as duplication. ADLS Gen2 shortcuts reduce integration costs across an enterprise by making data available for consumption through customized applications like Power BI visuals or LLMs. These shortcuts also increase the value of ADLS Gen2 data lakes by making them interoperable with Azure Databricks.
Azure M-series Mv3
The Azure M-series Mv3 family is the next generation of VMs with optimized memory, providing users with a lower TCO, more uptime and faster insights. These benefits result from the improved price-performance ratio of direct Azure IaaS deployments running SAP HANA workloads with SAP RISE on Azure. Fourth-generation Intel® Xeon® Scalable processors power Mv3 VMs, along with Azure Boost. These VMs also scale for SAP workloads, with sizes ranging up to 32 TB.
Premium SSD v2
Premium SSD v2 is available in preview at a low cost, offering up to 64 TB storage with latencies below one millisecond and up to 80K input/output operations per second (IOPS). This level of performance is suitable for IO-intensive workloads, providing flexibility in managing the cost of Tier-1 Azure production environments. IOPS scaling is also available in preview, which allows customers to scale their I/O performance up to 20K IOPS. Users can quickly perform transient operations like data loads and migrations, and then scale their I/O back down to cut costs.
Regarding Cloud Expenditure and Cloud Efficiency
Azure FinOps plays a key role in helping organizations analyze Azure usage, right-sizing resources, and control their cloud costs by optimizing spending for Microsoft Azure cloud services. The right strategies provide greater visibility into these costs, helping to allocate costs, plan budgets and implement best practices in cloud spending. These practices also allow organizations to make informed decisions about their cloud investments, increase business agility and improve their overall management of cloud resources.
At the end of the day, an organization who establishes a FinOps practice should be able to detect cost anomalies, prevent cost overruns, be able to allocate costs accurately, achieve cost savings over time, and succeed with cloud optimization. This means that they will increase their cloud efficiency and provide business teams with additional value. Not doing so will likely lead to significant costs while maintaining resources and capacity which aren’t needed. Therefore, make sure your approach to cloud services includes the FinOps principles and capabilities that Microsoft promotes.
If you haven’t done so already, subscribe to our newsletter here and you’ll receive free access to our Cost Management guideline: