»Ê¼Ò»ªÈË

XClose

Advanced Research Computing

Home
Menu

The importance of collaboration: the latest engagement between DiRAC and ARC

When time is scarce on a research project, it is important to continuously plan and effectively collaborate with the whole team. A good example of this is a recent project involving ARC and DiRAC.

Thirring-rhmc graphic

26 April 2024

Background

ARC recently worked on theÌýÌý±è°ù´ÇÂá±ð³¦³ÙÌý"Spontaneous Symmetry Breaking in 3d Models of Fermions"Ìýwith Prof Simon Hands (PI) which, due to a funding deadline, had to be delivered in 5 weeks. This project aims to explore the phase diagram of a relativistic field theory of fermions using a code base developed by Prof Hands et al, known as thirring-rhmc. However, the collaboration with ARC and Prof Hands covered a much smaller scope.Ìý

Aims

Our aim was to migrate the work of a PhD student (Dr Jude Worthy) into the default branch of the thirring-rhmc code base. Once this was completed, the intention was that some performance improvements could be investigated as part of the project.ÌýJude’s work implemented a higher accuracy but consequently lower performance formulation (Wilson kernel) of something already implemented in the code base (Shamir kernel). This reduction in performance is the reason for the desire to gain some performance improvements.ÌýHowever, from the PI’s initial comments, it was clear that the key aim remained the code migration.

“…I’m increasingly convinced it only makes sense to pursue this research program further if an improved formulation is employed, so the Shamir -> Wilson transition is essential." - Professor Simon Hands, Principal Investigator

Ìý

The Problems

Several obstacles threatened the success of this project. Development on the original version of thirring-rhmc had continued throughout Jude’s PhD but unfortunately git had not been used to develop the Wilson kernel. Therefore, the two codes had diverged significantly with no clear indication as to what degree. Due to this divergence, it was vital to develop a continuous testing suite to have any chance of success. However, the outputs of thirring-rhmc are statistical in nature and can, whilst remaining correct, vary significantly with only slight changes to the code. Therefore, a lot of domain specific knowledge would be required to design these tests.Ìý

What we did

This project’s strict time constraints required us to take a methodical approach to planning our work. For each task, we defined a clear definition of done and ensured we understood how that individual piece of work helped progress towards our key aim. Continuously planning our tasks in this way was essential to our success.Ìý

The lack of clarity around what changes in the Wilson kernel were significant meant our first task was to set up reliable unit tests. With these tests in place, we could confidently alter the code and catch any breaking changes we might introduce. Helpfully, some stale tests were already present in the repository. With Simon’s domain knowledge, we were able to update these existing tests to create a working test suite. When these tests failed and highlighted issues we couldn’t solve independently, we were able to quickly reach a solution through regular communication with Simon. Simon’s domain knowledge was an invaluable asset throughout the project. As a bonus, we were able to demonstrate the confidence regular testing gave us when carrying out large refactors and migrations. This will hopefully increase the chances of Simon’s research team continuing to maintain and build upon these tests, therefore preventing the tests going stale again. This is a great example of how close collaboration between RSEs and Researchers can benefit both parties.Ìý

The OutcomesÌý

This close collaboration and communication with Simon helped to quickly increase our knowledge of the code base and research domain. Due to this better understanding, we identified the likely causes of two known issues with Jude’s code. Most notably, we identified that the inflated value of an input parameter was a key reason for the Wilson kernels reduced performance.

Conclusion

To conclude, RSEs and Researchers work best together when they communicate effectively. Siloing the domain knowledge of these two parties only reduces the chances of success; our projects are collaborations and can only succeed if we work in this way from the very beginning.Ìý

Links