Moving from a Network Attached Storage model to an Argo CD-based deployment process

Engineers at Macquarie
Macquarie Engineering Blog
7 min readSep 13, 2023

--

By Kip Hamiltons, Software Engineer at Macquarie

Our bespoke Macquarie trading risk platform is central to operations in our Commodities and Global Markets (CGM) business, so when we make changes, it’s a significant undertaking.

CGM is currently moving from a Network Attached Storage (NAS)-based delivery to an Argo CD-based GitOps deployment process. The change will deliver many benefits including increased auditability through traceable and reproducible execution environments and simplified deployment pipelines. Both the new and old systems currently exist in parallel, as we work through the multi-stage process, to migrate fully to the new system.

Moving from a common, shared platform to containerised applications requires a big change in mindset and can be a new concept for some engineers. By delivering this change, we are able to demonstrate what is possible, while also providing a stencil for their progress to follow.

The existing system: the NAS

Macquarie’s CGM business has a significant set of application servers distributed across key regions and global markets. These servers run a diverse set of applications to support the business, such as risk calculations and pricing model evaluations.

The brief

The global network of servers needed a way of sharing files and data between each other. The solution — devised before Amazon Web Services (AWS) had been founded - was to use a NAS. This method uses a dedicated file system server (or server cluster), which other servers can mount and use as if the files were local.

This model of using a shared file system across fleets of servers was very common in banks and other complex environments developed pre-cloud. This solution has proven to be very resilient, allowing our business to move with more agility and less overhead than other traditional deployment tooling.

The current NAS configuration has a server per region, with each region having some region-specific config values, such as nearby database and Application Program Interface (API) endpoints.

The opportunity: GitOps with Argo CD

The most important features of our NAS and control systems are the strict auditability of changes and the ability to manage replication. GitOps-based approaches are alternatives which afford even stricter auditability, file sharing across applications and other benefits too.

GitOps processes have a git repository as the single source of truth for making and tracking all changes. Using GitOps workflows enables you set up reproducible infrastructure — including your file system — deterministically, which means that anyone can follow the steps specified in the git records and arrive with the same results. Git’s usage of cryptographic hashes provides an auditable, verifiable history of file changes, which solves the last piece of the puzzle.

Argo CD is a Cloud Native Computing Foundation project, providing a GitOps solution for orchestrating a Kubernetes application and automating its deployment processes. Argo CD meets our auditability requirements and provides improved availability and resilience, due to it being an elastically scalable Kubernetes application. It also brings health checks and monitoring, a clean web-app interface, and SSO authentication to the table.

Our GitOps distribution using Argo CD involves releasing container images with the application code to the AWS Elastic Container Registry, alongside config maps containing configuration data. We integrate automated checks into merge requests to enforce up to date business approvals, as well as check that change records have been filed and approved.

Challenges using Argo CD

Argo CD is not a panacea which solves all deployment problems. A complete transition requires designing new workflows for application operations staff. Business approvals currently managed by email need to be mapped into our new delivery paradigm, preferably without adding manual steps.

Applications which are not yet containerised cannot be transitioned to Argo CD, so the existing NAS-based system is here to stay for the near future for those applications. A GitOps solution not designed around Kubernetes can be considered for these cases, perhaps leveraging existing CI/CD pipelines.

The transition

For many years, our developers have been able to safely assume precise file locations of libraries and data. This has been baked into our codebases, with many references directly in source code, as well as in library search paths, environment variables set by unknown means, and many other implicit and explicit references.

Project stages — we’re right in the middle at the moment. We have launched Argo CD alongside the existing system and next we will consolidate the regions.

The first phase of the transition was analysing the code dependencies on NAS paths. Finding the direct references to paths is quite straightforward, excluding cases where paths are constructed indirectly or provided as API parameters. Enumerating and documenting each of these produced a list of categories of reference which could be followed up in phase two.

We devised three general strategies for dealing with the files. Our preferred option was to see if we could remove the file dependency. If we find there isn’t an easy way to take out the file, then we consider its size. For large files, we opted to mount them in the containers, which solves the problem at the cost of adding overhead for container retrieval and instance startup times. Smaller files were agglomerated into one volume which we mount into an Init Container, then copy to their final destinations.

The thinking behind copying small files during pod initialisation is that we can keep them in the app of apps repository, which affords us the benefits of the GitOps model. If we placed large files which change frequently into the repository too, such as report configuration, then they would balloon the size of the repository and diminish any performance benefits we might realise with a git-based model.

Once each file was accounted for and dealt with, we ensured we ironed out the initialisation. We orchestrate the pods with a customised control plane application, which processes the template manifests for each of the Kubernetes resources our application pods need. This service provides the API endpoints for interacting with the application, and scales the pods and resources as required. Mending the details for region-specific behaviour proved to be the only complication, which we solved using region-selecting environment variables and setup scripts.

Our process for handling NAS references.

Developing new release processes

The next phase of the Argo CD adoption was to define and rollout updated release processes. Changing the infrastructure paradigm calls for a rethink of all our deployment processes. Processes must be approved, automations implemented, documentation written, developers and operations staff trained.

We can make REST API calls to our source control to run the git tasks required to facilitate Argo CD deployments. A few automatic steps are added to existing automatic steps, including a bot triggered by webhooks. The bot approves merge requests based on combined business change management and source control information.

There was an additional manual button click added to initiate rollouts , but the end results are very minor changes to operations workflows, with potential to cut them down further once a full migration away from the NAS is completed.

Launching and transitioning

The ideal transition would be making GitOps the single source of truth over the NAS in a single atomic step, while maintaining the NAS in parallel as issues are remediated.

Afterwards, we would decommission the NAS, archiving its change history and records. More work must be done before the source of truth can be changed to GitOps repositories though. The Argo CD regions represent the minimum set required today. We have to consolidate the extra regions which the NAS specifies to match those, as demonstrated in the graphic below.

We are able to incrementally bring extra regions into alignment with regions we’re keeping by changing configuration options to match over time until the configurations are copies of each other, at which point the redundant regions — which don’t exist in the Argo CD system — can be safely removed.

We have to consolidate the NAS regions to match Argo CD regions, before the NAS regions can be removed.

We opted for a cron job style service periodically running on our custom job scheduling platform to reconcile differences between the NAS and Argo CD configuration while the systems exist in parallel. The service effectively provides us with a stopgap while we consolidated the regions on the NAS and migrate the necessary files to the new ecosystem.

What’s next?

That’s where we’re up to now. We have accounted for each of the file dependencies that our risk platform uses, and we’ve split them into their categories. The files we could fit into the Argo CD delivery paradigm have each been moved into it, while the large ones have been moved into container images. Future work includes migrating workloads onto the new iteration of the platform and moving datasets into cloud-native data storage, such as AWS S3.

I have really enjoyed this project. Upgrading old infrastructure and developing shiny, new processes, while improving our dependency management and system architecture has been a great experience.

I had the opportunity to dive into the pre-cloud landscape the business previously operated in and learn about the advantages that modern tooling affords. Overall, applying GitOps processes by using a system with Argo CD has been a fantastic learning experience and I have grown my skillset considerably because of this project.

--

--

Engineers at Macquarie
Macquarie Engineering Blog

Sharing insights, innovative ideas and ways of working at Macquarie.