Big Data Data Science Distributed Systems GPU Scientific Libraries (Numpy/Pandas/SciKit/...)
The search for faster computing remains of great importance to the software community. Relatively inexpensive modern hardware, such as GPUs, allows users to run highly parallel code on thousands, or even millions of cores on distributed systems.
Building efficient GPU software is not a trivial task, often requiring a significant amount of engineering hours to attain the best performance. Similarly, distributed computing systems are inherently complex. In recent years, several libraries were developed to solve such problems. However, they often target a single aspect of computing, such as GPU computing with libraries like CuPy, or distributed computing with Dask.
Libraries like Dask and CuPy tend to provide great performance while abstracting away the complexity from non-experts, being great candidates for developers writing software for various different applications. Unfortunately, they are often difficult to be combined, at least efficiently.
With the recent introduction of NumPy community standards and protocols, it has become much easier to integrate any libraries that share the already well-known NumPy API. Such changes allow libraries like Dask, known for its easy-to-use parallelization and distributed computing capabilities, to defer some of that work to other libraries such as CuPy, providing users the benefits from both distributed and GPU computing with little to no change in their existing software built using the NumPy API.
Type: Talk (30 mins); Python level: Beginner; Domain level: Beginner
Peter Andreas Entschev is a senior system software engineer in the AI Infrastructure group at NVIDIA, where he works on the RAPIDS stack, building GPU-enabled distributed software. Before NVIDIA, he worked on real-time computer vision systems for various applications. He holds an MSc in electrical engineering and applied computer science from the Federal University of Technology - Paraná, Brazil.