Innovating as cloud engineers: efficiencies, enablers and empowerment

Engineers at Macquarie

Published in

Macquarie Engineering Blog

6 min readNov 13, 2023

By Pratibha Pratibha and Carlo Bonzon, Cloud Engineers at Macquarie

Overview

The best thing about being software engineers at Macquarie is that we are empowered to explore new and better ways of doing things in the cloud space.

In our roles we are given the opportunity to continuously test and learn from our ideas and collaborate with teams in Sydney, Manila and Gurugram.

We are supported to try out new technologies and methodologies, with enough time to pilot and implement scalable solutions for our team. Our work on finding efficiencies in our cloud usage identified lots of learnings that we’d like to share.

Macquarie Group was an early adopter of a public cloud strategy amongst Australian financial institutions. At the end of 2021 our team successfully completed migrating all servers we were supporting to the Amazon Web Services (AWS) cloud. Having a background focused on software development, it was a proud moment for us.

The migration to cloud greatly improved our agility and scalability to respond to the changing demands of different projects while simultaneously meeting Business-As-Usual (BAU) requirements.
We found that while it was beneficial to easily scale up our servers when needed, we knew that with cloud, we also needed to prioritise how to use it cost-efficiently to achieve all of its benefits. With that in mind, we started focusing on finding ways to maximise value while not compromising the performance of our applications.

Cultural enablers

One of the things that really helps us to work in this innovative and proactive way is the support we receive to try out new technologies and methodologies. Our manager makes sure we have enough time to test and implement our ideas. If we’re ever stuck, we are supported to connect with our teams in Sydney, Gurugram or Manila. As we have numerous environments to manage, we plan them in a phased manner with an achievable timeline.

We also have internal resources like dedicated pages where teams collaborate and share knowledge. We can create, share and discuss files, ideas, diagrams and more in these supportive environments. We have internal Cloud Team channels as well, which are very helpful in acknowledging and resolving any cloud related issues. We also follow the central cloud team’s communications which notifies us about new technologies and shares tips to make our infrastructure more robust.

Our platform

As engineers in cloud infrastructure, we are excited by researching and exploring different ways to optimise our cloud usage, without sacrificing the quality or standard of process.

The main application that our team migrated is a desktop application for High Performance Computing (HPC). It is being used to generate regulatory reports, so performance to meet Service Level Agreements (SLAs) is one thing that should not be compromised. Our platform mainly consists of AWS Elastic Cloud Compute (EC2) and Relational Database Service (RDS) for SQL Server. We have several non-production environments, that are being used for different projects. Below are the key steps we took as part of our cloud journey.

Finding the right size

One of the key benefits of cloud computing is that you pay based on your usage, hence it’s important to find the right EC2 for your workload so that you’re not paying for resources you don’t need.

At present, AWS offers over 80 types of instance series for EC2 and 16 for RDS. In order to choose the type and size best suited for Macquarie, we needed to know the type of our workload, our Central Processing Unit (CPU) and memory usage, as well as requirements for disk and network bandwidth. Monitoring through CloudWatch helped us discover the right type of instance for our workload. After observing that our workload is more memory intensive, changing the servers from M series (general purposes instances) to R series (memory optimized instances) meant we were able to achieve significant cost savings for our cloud usage (CPU is more expensive than memory).

With applications running on AWS, we found that it is helpful to always be on the lookout for newer generations of the instance type you are using. We had a scenario where we switched to a newer generation and downgraded the server to lower configuration and specifications, as a result, reduced half of the on-demand hourly rate, and despite the lower specification, the processing time of our workload improved. It turned out we were not using much of the CPU and the lower specification instance has more network bandwidth which our workload really needs. While on the topic of downgrading, if you observe that you are not using over 50% of CPU or memory for a certain period, or in our case, for a certain number of cycles of processing, then it may already be a good candidate for a downgrade.

Something to be mindful of for right-sizing is the Input/Output Operations Per Second (IOPS) of your Elastic Block Store (EBS) volumes. While IOPS has direct impact on the performance of data-intensive applications such as databases, it is still important to not overprovision them to avoid paying for resources you are not utilising. We used VolumeReadOps and VolumeWriteOps metrics in CloudWatch to calculate the optimal IOPS for our workload.

AWS also has a paid feature where customers can activate AWS Compute Optimiser to get recommendations from AWS on EC2 instances and other services that can be right-sized.

Stopping idle instances

We learned that it was important to build awareness among the team to stop servers or environments when not being used. This practice is simple yet directly led to cost-saving in our cloud usage. In Macquarie, we are lucky that we have a central cloud team that develops tools for us to easily manage our cloud components. One of them is like a dashboard wherein we can see all our servers and stop or start it with one click, we can also set a schedule, or call Application Programming Interfaces (APIs) to do it programmatically.

Turning off resiliency features in non-production environments

In an Amazon RDS Multi-AZ (Availability Zone) deployment, Amazon RDS automatically creates a primary database (DB) instance and synchronously replicates the data to an instance in a different AZ. When it detects a failure, Amazon RDS automatically fails over to a standby instance without manual intervention. This feature is a must have for our production environment, but not necessary for non-critical environments. Hence, we have turned this off in our non-production servers to reduce operational cost.

Another feature that is worth assessing if you really need in non-production servers are snapshots and backups. Snapshots, or point-in-time copy of your drives, and backups can be expensive especially for large amounts of data.

Taking advantage of spot instances

We found using spot instances a cost effective option in reducing our AWS spending depending on the type of workload. Spot instances uses spare EC2 capacity and are offered at steep discounts. And it’s suitable for non-production servers that perform background processing which are not time bound.

Sharing our learnings

After successfully finding these efficiencies in our cloud usage, our work was celebrated on our internal channels, and we were asked to present our AWS work at our division-wide technology showcase. Being more technically focused, we prefer doing coding instead of talking, so presenting our work to a wide audience put us out of our comfort zones.

While it was a challenging and new experience, we enjoyed the chance to share what we’d learned so that other teams can benefit from our findings. Being the AWS infrastructure Subject Matter Experts, we need to make sure that we are continuously researching, testing, implementing and helping other parts of the organization do that too.

What’s next

All the above methods discussed are simple to implement but have allowed us to greatly reduce our cloud spending. Thanks to our supportive teams we were given the space and support to test and share our learnings across Macquarie.

We are continuing to optimise our cloud usage throughout Macquarie as part of our cloud first strategy. Technology is expanding daily so we are always staying up to date on the latest features and offerings from AWS and are continuously willing to explore and adapt.