lib An Effective Fusion and Tile Size Model for Optimizing Image Processing Pipelines
lib Bridging the Gap between Deep Learning and Sparse Matrix Format Selection
lib Cache-Tries: Concurrent Lock-Free Hash Tries with Constant-Time Operations
lib Communication-avoiding parallel minimum cuts and connected components
lib Efficient parallel determinacy race detection for two-dimensional dags
lib Featherlight On-the-fly False-sharing Detection
lib Harnessing Epoch-based Reclamation for Efficient Range Queries
lib Interval-Based Memory Reclamation
lib Juggler: A Dependency-Aware Task Based Execution Framework for GPUs
lib Making Pull-Based Graph Processing Performant
lib Optimizing N-dimensional, winograd-based convolution for manycore CPUs
lib PAM: Parallel Augmented Maps
lib Register Optimizations for Stencils on GPUs
lib swSpTRSV: a Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architectures
lib VerifiedFT: a verified, high-performance precise dynamic race detector