programming | augalip | Ali Ulvi Galip Senocak

Imagine orchestrating a symphony of 2800 computers, each contributing to solving a complex puzzle. This was the essence of my recent foray into High-Performance Computing (HPC), a journey that not only pushed the boundaries of computational capabilities but also expanded my understanding of problem-solving in the digital age. At the heart of this adventure was a complex, more specifically, an O(N²) algorithm, a computational beast that demanded both creativity and raw power to tame.

Let’s Have a Look at the Complexity

Imagine you’re at a huge party, and you want to make sure everyone meets everyone else. If there are just 10 people, it’s pretty straightforward. But what if there are 1000 people? Suddenly, you’re organizing tens of thousands of introductions. That’s what I mean by O(N²): as the number of guests (or data points) increases, the number of introductions (or calculations) grows exponentially. For our research, this meant our computations could get really intense, really fast, necessitating a powerhouse computing solution.

The Odyssey Begins

The challenge was akin to ensuring a grand party where everyone needed to meet everyone else, but with thousands of guests, the complexity skyrocketed. This problem required a powerhouse solution, and HPC was our venue, with GNU Parallel acting as the master of ceremonies.

Stop Right There, You Could Have Used Dask

In the vast landscape of HPC, choosing the right tool for the job is crucial. GNU Parallel emerged as my hero, not by chance, but necessity. The HPC platform I was working with had its quirks and restrictions, leading me to compile GNU Parallel from its source code (This will be another blog post). This process, while a bit more hands-on, allowed me unparalleled flexibility and optimization tailored to our HPC environment. GNU Parallel let me distribute our party introductions (aka computational tasks) across 2800 cores, with a single line of code, or terminal command, you know.

How Did It Feel?

Transitioning my project to use HPC and specifically 2800 cores was like stepping into a cockpit of a supersonic jet. Every dial and switch (or in this case, line of code and computing core) had to be perfectly aligned for takeoff. The primary hurdle wasn’t just the sheer scale of operations but ensuring that all these introductions at the party happened smoothly and without guests (or data points) waiting too long. It was a dance of data distribution, task synchronization, and a lot of computational heavy lifting.

It was like planning a mate in five in chess. Since HPC systems work with queue systems and users can not modify the code without waiting for it to be run, designing an SLURM script needs delicate planning, testing, and hoping. Like a mate in five, a smooth operation leading to an errorless execution brings joy to humankind.

La Casa de la Computadora: El Plan

I enjoyed watching the TV show La Casa de Papel. While it started my Spanish journey with Duolingo, I also liked the meticulously planned tasks. The below snippet, which you consider as the first move of a mate in five, illustrates the streamlined power of GNU Parallel, enabling the simultaneous processing of tasks across thousands of cores with remarkable simplicity and efficiency.

$GNU_PARALLEL_PATH=/path/to/library/

$SCRIPT_PATH=/path/to/script/

$ARGS_TXT=/path/to/args

$GNU_PARALLEL_PATH --colsep ' ' python $SCRIPT_PATH {1} {2} :::: $ARGS_TXT

Setting the paths ($GNU_PARALLEL_PATH, $SCRIPT_PATH, $ARGS_TXT) was like gathering our crew, ensuring each member knew their role and position. Here, the paths guide GNU Parallel to the necessary tools and scripts, positioning them for the task at hand.
The command itself, invoking GNU Parallel with --colsep ' ' and pointing it to our Python script alongside the arguments from $ARGS_TXT, mirrors the precise coordination of the heist team. Each placeholder, {1} and {2}, represents an argument for the script, akin to the roles assigned to each team member—ensuring they act in concert under the guidance of GNU Parallel, our Professor.
:::: $ARGS_TXT is the signal to proceed, launching our carefully laid plans into action, distributing tasks across the cores as seamlessly as the crew executing their roles within the Mint.

In essence, this code was the blueprint of the operation, guiding the computational power at my disposal with precision and strategy to achieve the objective. Just as the heist in La Casa de Papel relied on the synergy of each team member’s unique skills, this script relied on the synergy of GNU Parallel and Python, executing the plan flawlessly in the vast network of the HPC system.

“Houston, We Have a Problem” Moment

Not everything goes as planned; sometimes, you may strategize for a mate in five, but find the game stretching beyond your calculated moves. An issue for which Googling does not yield many results —a 0:53 error code on a debug cluster—threatened to unravel the meticulous fabric of our computational plan. At first glance, this hurdle might have seemed insurmountable, akin to hitting a wall in an intricate chess game where every move you’ve planned suddenly comes into question. If I had prematurely executed the full code on the main cluster, the repercussions could have been significant, resulting in lost time and resources.

This moment, while fraught with tension, turned into a pivotal learning opportunity. The issue, stemming from inaccessible error and output paths in the SLURM script, was a reminder of the fundamental principle of computational work: attention to detail is paramount. Solving this puzzle not only allowed our project to proceed but also underscored the importance of vigilance and thoroughness in every step of the process.

Reflecting on this “Houston, We Have a Problem” moment, I realized it was more than just a technical glitch; it was a testament to the resilience and adaptability required in high-performance computing. This experience taught me the value of persistence and the critical nature of problem-solving—a skill that extends beyond the realms of HPC and into every aspect of research and innovation. It reinforced my belief that facing challenges head-on, with patience and determination, can lead to breakthroughs that were once deemed beyond reach.

Reflections and Future Horizons:

Embarking on this venture into the depths of HPC and GNU Parallel transcended a mere technical pursuit; it evolved into an expedition of self-discovery. This odyssey underscored a crucial lesson: armed with the right tools and an unwavering quest for knowledge, the realm of possibilities unfurls boundlessly. Looking forward, the invaluable insights garnered from this journey shine a guiding light on future endeavors, heralding a future brimming with further innovation in Machine Learning in climate and atmospheric sciences.

Moreover, this experience has endowed me with a pivotal skill set, primed for scaling to tackle more complex challenges and harnessing even more formidable HPC resources.

This chapter, however, was distinct. Previously, my voyages into the world of HPC were accompanied by mentors and guides, offering nudges in the right direction or a quick fix for unforeseen obstacles. This time, the safety net was gone. I navigated the vast seas of code, configurations, and computations solo, without the familiar comforts of guidance or straightforward solutions for missing pieces. Standing on the precipice of independence, I executed the task with my own hands, steering through challenges with no external aid.

The triumph was not just in the successful execution but in the silent, profound realization of my own capability and potential. The anticipation of embracing this uncharted territory again is not just a prospect but a promise to myself—a promise of diving back into the complexities of HPC with a spirit tempered by experience and a heart eager for the endless discoveries that lie ahead.

Conclusion:

Stepping into the world of High-Performance Computing opened new vistas in my research, underscoring the transformative potential of technology when harnessed with purpose and passion. As we continue to explore these digital frontiers, the lessons learned and the milestones achieved will serve as beacons, guiding us toward a future where our computational dreams can become a reality.

Stay Connected:

Join me as we venture further into this exciting domain, pushing the boundaries of what’s possible. For more stories, insights, and updates on my journey through machine learning in climate and atmospheric sciences, follow my blog and follow me on LinkedIn. Together, let’s explore the boundless possibilities that lie ahead.

augalip | Ali Ulvi Galip Senocak

Machine learning engineer and climate science researcher. Ph.D. Candidate. Human of three cats and one budgie.

Tag Archives: programming

Unlocking the Power of Supercomputers: My HPC Adventure with 2800 Cores and GNU Parallel