James C. Sutherland, Matthew Might & Christopher Earl, Tony Saad
Programmers for high-performance scientific computing applications must traditionally be cognizant of numerous issues when writing software:
- Physics model formulation
- Computer hardware targeted by the application
These represent three distinct domains of computational science and engineering: physics/science, applied mathematics and computer science.
The goal of Nebo is to provide expressive syntax that allows application programmers to focus primarily on physics model formulation rather than on details of discretization and hardware. Nebo then handles details of discretization and targeting various hardware architectures (CPU, multicore, GPU, etc.). Nebo is designed to work on-node; it does not perform inter-node (MPI) operations. However, it can manage fields on both CPU and GPU simultaneously, and supports both synchronous and asynchronous data transfers between CPU and GPU.
Nebo is a domain-specific language, embedded in C++. Because it is written within C++, existing C++ code can be interoperable with Nebo and this provides a viable migration path for codes to transition to Nebo usage without a full refactor.
Our goals for Nebo are:
- Provide syntax that expresses intent, not implementation.
- Achieve performance to match or exceed hand-tuned code that Nebo replaces.
- Future-proof application code by ensuring extensibility. We want to migrate to multicore, GPU by auto-generating optimized "back-end" code.
- Embedd within C++ to maintain compatibility, interoperability and portability.
- Provide a strongly typed with type inference to produce robust and correct code (incorrect code doesn't compile).
As an example of what Nebo does, consider the evaluation of the RHS of the equation:
Within Nebo, this is written in the following C++ code:
rhs <<= - divX( xConvFlux + xDiffFlux )
- divY( yConvFlux + yDiffFlux )
- divZ( zConvFlux + zDiffFlux );
where divX, divY and divZ are operators that implement the discrete divergence in each coordinate direction. The above Nebo statement will work on serial CPU, multicore CPU and GPU (via CUDA kernels).
The following figure shows the speedup obtained over a single CPU core for multithreaded and GPU backends of Nebo for problem sizes of 643 and 1283. Notably, we observe 14-16x speedup on GPU. This is without the user having to write any different code at all.
Finally, we have shown that, even in serial CPU mode, Nebo is 4-10x faster than other code within the Uintah framework on a canonical CFD problem.