Presentation Slides, PDFs, Source Code and other presenter materials are available at: https://github.com/cppcon/cppcon2016
Much attention has been given to what modern optimizing compilers can do with your code, but little is ever said as to how to make the compiler invoke these optimizations. Of course, the answer is compiler switches! But which ones are needed to generate the best code? How many switches does it take to get the best performance? How do different compilers compare when using the same set of switches? I explore all of these questions and more to shed light on the interplay between C++ compilers and modern hardware drawing on my work in high performance scientific computing.
Enabling modern optimizing compilers to exploit current-generation processor features is critical to success in this field. Yet, modernizing aging codebases to utilize these processor features is a daunting task that often results in non-portable code. Rather than relying on hand-tuned optimizations, I explore the ability of today's compilers to breathe new life into old code. In particular, I examine how industry-standard compilers like those from gcc, clang, and Intel perform when compiling operations common to scientific computing without any modifications to the source code. Specifically, I look at streaming data manipulations, reduction operations, compute-intensive loops, and selective array operations. By comparing the quality of the code generated and time to solution from these compilers with various optimization settings for several different C++ implementations, I am able to quantify the utility of each compiler switch in handling varying degrees of abstractions in C++ code. Finally, I measure the effects of these compiler settings on the up-and-coming industrial benchmark High Performance Conjugate Gradient that focuses more on the effects of the memory subsystem than current benchmarks like the traditional High Performance LinPACK suite.
University of Wisconsin-Madison
I am a third-year PhD candidate working in computational astrophysics. My undergraduate work was in computer science, physics, and mathematics, and I have an M.S. in physics. Fundamentally, my interests lie in developing software systems to try to answer difficult scientific questions combining modern parallel programming techniques in C++ with heterogeneous and massively parallel hardware. As such, I have a keen interest in the application of high performance computing to scientific problems (often called "scientific computing"). I spend most of my days attempting to design and build flexible, abstract software for parallel hardware in C++. Currently, I am part of a collaboration including the University of Washington and the University of Illinois at Urbana-Champagne working on the development of the cosmological N-body code CHArm N-body GrAvity solver (ChaNGa). Although it has excellent scaling properties (up to 512K processors with 93% efficiency), the node-level performance is sub-optimal. I am now working with a CS PhD candidate at UIUC to replace much of the C++98 codebase with C++11 and incorporate GPU computing using the CUDA runtime.
Videos Filmed & Edited by Bash Films: http://www.BashFilms.com