Professional CUDA C Programming

Professional CUDA C Programming
-0 %
Besorgungstitel - wird vorgemerkt | Lieferzeit: Besorgungstitel - Lieferbar innerhalb von 10 Werktagen I

Unser bisheriger Preis:ORGPRICE: 59,50 €

Jetzt 59,49 €*

Alle Preise inkl. MwSt. | Versandkostenfrei
Artikel-Nr:
9781118739327
Veröffentl:
2014
Erscheinungsdatum:
07.10.2014
Seiten:
528
Autor:
John Cheng
Gewicht:
885 g
Format:
236x187x30 mm
Sprache:
Deutsch
Beschreibung:

John Cheng, PHD, is a Research Scientist at BGP International in Houston. He has developed seismic imaging products with GPU technology and many high-performance parallel production applications on heterogeneous computing platforms.
 
Max Grossman is an expert in GPU computing with experience applying CUDA to problems in medical imaging, machine learning, geophysics, and more.
 
Ty McKercher has been helping customers adopt GPU acceleration technologies while he has been employed at NVIDIA since 2008.
Break into the powerful world of parallel GPU programming with this down-to-earth, practical guide
 
Designed for professionals across multiple industrial sectors, Professional CUDA C Programming presents CUDA -- a parallel computing platform and programming model designed to ease the development of GPU programming -- fundamentals in an easy-to-follow format, and teaches readers how to think in parallel and implement parallel algorithms on GPUs. Each chapter covers a specific topic, and includes workable examples that demonstrate the development process, allowing readers to explore both the "hard" and "soft" aspects of GPU programming.
 
Computing architectures are experiencing a fundamental shift toward scalable parallel computing motivated by application requirements in industry and science. This book demonstrates the challenges of efficiently utilizing compute resources at peak performance, presents modern techniques for tackling these challenges, while increasing accessibility for professionals who are not necessarily parallel programming experts. The CUDA programming model and tools empower developers to write high-performance applications on a scalable, parallel computing platform: the GPU. However, CUDA itself can be difficult to learn without extensive programming experience. Recognized CUDA authorities John Cheng, Max Grossman, and Ty McKercher guide readers through essential GPU programming skills and best practices in Professional CUDA C Programming, including:
* CUDA Programming Model
* GPU Execution Model
* GPU Memory model
* Streams, Event and Concurrency
* Multi-GPU Programming
* CUDA Domain-Specific Libraries
* Profiling and Performance Tuning
 
The book makes complex CUDA concepts easy to understand for anyone with knowledge of basic software development with exercises designed to be both readable and high-performance. For the professional seeking entrance to parallel computing and the high-performance computing community, Professional CUDA C Programming is an invaluable resource, with the most current information available on the market.
Professional CUDA Programming in C provides down to earth coverage of the complex topic of parallel computing, a topic increasingly essential in every day computing. This entry-level programming book for professionals turns complex subjects into easy-to-comprehend concepts and easy-to-follows steps.
FOREWORD xvii
 
PREFACE xix
 
INTRODUCTION xxi
 
CHAPTER 1: HETEROGENEOUS PARALLEL COMPUTING WITH CUDA 1
 
Parallel Computing 2
 
Sequential and Parallel Programming 3
 
Parallelism 4
 
Computer Architecture 6
 
Heterogeneous Computing 8
 
Heterogeneous Architecture 9
 
Paradigm of Heterogeneous Computing 12
 
CUDA: A Platform for Heterogeneous Computing 14
 
Hello World from GPU 17
 
Is CUDA C Programming Difficult? 20
 
Summary 21
 
CHAPTER 2: CUDA PROGRAMMING MODEL 23
 
Introducing the CUDA Programming Model 23
 
CUDA Programming Structure 25
 
Managing Memory 26
 
Organizing Threads 30
 
Launching a CUDA Kernel 36
 
Writing Your Kernel 37
 
Verifying Your Kernel 39
 
Handling Errors 40
 
Compiling and Executing 40
 
Timing Your Kernel 43
 
Timing with CPU Timer 44
 
Timing with nvprof 47
 
Organizing Parallel Threads 49
 
Indexing Matrices with Blocks and Threads 49
 
Summing Matrices with a 2D Grid and 2D Blocks 53
 
Summing Matrices with a 1D Grid and 1D Blocks 57
 
Summing Matrices with a 2D Grid and 1D Blocks 58
 
Managing Devices 60
 
Using the Runtime API to Query GPU Information 61
 
Determining the Best GPU 63
 
Using nvidia-smi to Query GPU Information 63
 
Setting Devices at Runtime 64
 
Summary 65
 
CHAPTER 3: CUDA EXECUTION MODEL 67
 
Introducing the CUDA Execution Model 67
 
GPU Architecture Overview 68
 
The Fermi Architecture 71
 
The Kepler Architecture 73
 
Profile-Driven Optimization 78
 
Understanding the Nature of Warp Execution 80
 
Warps and Thread Blocks 80
 
Warp Divergence 82
 
Resource Partitioning 87
 
Latency Hiding 90
 
Occupancy 93
 
Synchronization 97
 
Scalability 98
 
Exposing Parallelism 98
 
Checking Active Warps with nvprof 100
 
Checking Memory Operations with nvprof 100
 
Exposing More Parallelism 101
 
Avoiding Branch Divergence 104
 
The Parallel Reduction Problem 104
 
Divergence in Parallel Reduction 106
 
Improving Divergence in Parallel Reduction 110
 
Reducing with Interleaved Pairs 112
 
Unrolling Loops 114
 
Reducing with Unrolling 115
 
Reducing with Unrolled Warps 117
 
Reducing with Complete Unrolling 119
 
Reducing with Template Functions 120
 
Dynamic Parallelism 122
 
Nested Execution 123
 
Nested Hello World on the GPU 124
 
Nested Reduction 128
 
Summary 132
 
CHAPTER 4: GLOBAL MEMORY 135
 
Introducing the CUDA Memory Model 136
 
Benefi ts of a Memory Hierarchy 136
 
CUDA Memory Model 137
 
Memory Management 145
 
Memory Allocation and Deallocation 146
 
Memory Transfer 146
 
Pinned Memory 148
 
Zero-Copy Memory 150
 
Unifi ed Virtual Addressing 156
 
Unified Memory 157
 
Memory Access Patterns 158
 
Aligned and Coalesced Access 158
 
Global Memory Reads 160
 
Global Memory Writes 169
 
Array of Structures versus Structure of Arrays 171
 
Performance Tuning 176
 
What Bandwidth Can a Kernel Achieve? 179
 
Memory Bandwidth 179
 
Matrix Transpose Problem 180
 
Matrix Addition with Unified Memory 195
 
Summary 199
 
CHAPTER 5: SHARED MEMORY AND CONSTANT MEMO

Kunden Rezensionen

Zu diesem Artikel ist noch keine Rezension vorhanden.
Helfen sie anderen Besuchern und verfassen Sie selbst eine Rezension.