Professional CUDA C Programming

Name: Professional CUDA C Programming
Brand: John Wiley & Sons Inc
SKU: 9781118739327
Price: 59.49 EUR
Availability: InStock

-0 %

John Cheng

John Wiley & Sons Inc

Besorgungstitel - wird vorgemerkt | Lieferzeit: Besorgungstitel - Lieferbar innerhalb von 10 Werktagen I

Unser bisheriger Preis:ORGPRICE: 59,50 €

Jetzt 59,49 €*

Alle Preise inkl. MwSt. | Versandkostenfrei

Artikel-Nr:

9781118739327

Veröffentl:

2014

Erscheinungsdatum:

07.10.2014

Seiten:

528

Autor:

John Cheng

Gewicht:

885 g

Format:

236x187x30 mm

Sprache:

Deutsch

Beschreibung:

John Cheng, PHD, is a Research Scientist at BGP International in Houston. He has developed seismic imaging products with GPU technology and many high-performance parallel production applications on heterogeneous computing platforms.

Max Grossman is an expert in GPU computing with experience applying CUDA to problems in medical imaging, machine learning, geophysics, and more.

Ty McKercher has been helping customers adopt GPU acceleration technologies while he has been employed at NVIDIA since 2008.

Break into the powerful world of parallel GPU programming with this down-to-earth, practical guide

Designed for professionals across multiple industrial sectors, Professional CUDA C Programming presents CUDA -- a parallel computing platform and programming model designed to ease the development of GPU programming -- fundamentals in an easy-to-follow format, and teaches readers how to think in parallel and implement parallel algorithms on GPUs. Each chapter covers a specific topic, and includes workable examples that demonstrate the development process, allowing readers to explore both the "hard" and "soft" aspects of GPU programming.

Computing architectures are experiencing a fundamental shift toward scalable parallel computing motivated by application requirements in industry and science. This book demonstrates the challenges of efficiently utilizing compute resources at peak performance, presents modern techniques for tackling these challenges, while increasing accessibility for professionals who are not necessarily parallel programming experts. The CUDA programming model and tools empower developers to write high-performance applications on a scalable, parallel computing platform: the GPU. However, CUDA itself can be difficult to learn without extensive programming experience. Recognized CUDA authorities John Cheng, Max Grossman, and Ty McKercher guide readers through essential GPU programming skills and best practices in Professional CUDA C Programming, including:
* CUDA Programming Model
* GPU Execution Model
* GPU Memory model
* Streams, Event and Concurrency
* Multi-GPU Programming
* CUDA Domain-Specific Libraries
* Profiling and Performance Tuning

The book makes complex CUDA concepts easy to understand for anyone with knowledge of basic software development with exercises designed to be both readable and high-performance. For the professional seeking entrance to parallel computing and the high-performance computing community, Professional CUDA C Programming is an invaluable resource, with the most current information available on the market.

Professional CUDA Programming in C provides down to earth coverage of the complex topic of parallel computing, a topic increasingly essential in every day computing. This entry-level programming book for professionals turns complex subjects into easy-to-comprehend concepts and easy-to-follows steps.

FOREWORD xvii

PREFACE xix

INTRODUCTION xxi

CHAPTER 1: HETEROGENEOUS PARALLEL COMPUTING WITH CUDA 1

Parallel Computing 2

Sequential and Parallel Programming 3

Parallelism 4

Computer Architecture 6

Heterogeneous Computing 8

Heterogeneous Architecture 9

Paradigm of Heterogeneous Computing 12

CUDA: A Platform for Heterogeneous Computing 14

Hello World from GPU 17

Is CUDA C Programming Difficult? 20

Summary 21

CHAPTER 2: CUDA PROGRAMMING MODEL 23

Introducing the CUDA Programming Model 23

CUDA Programming Structure 25

Managing Memory 26

Organizing Threads 30

Launching a CUDA Kernel 36

Writing Your Kernel 37

Verifying Your Kernel 39

Handling Errors 40

Compiling and Executing 40

Timing Your Kernel 43

Timing with CPU Timer 44

Timing with nvprof 47

Organizing Parallel Threads 49

Indexing Matrices with Blocks and Threads 49

Summing Matrices with a 2D Grid and 2D Blocks 53

Summing Matrices with a 1D Grid and 1D Blocks 57

Summing Matrices with a 2D Grid and 1D Blocks 58

Managing Devices 60

Using the Runtime API to Query GPU Information 61

Determining the Best GPU 63

Using nvidia-smi to Query GPU Information 63

Setting Devices at Runtime 64

Summary 65

CHAPTER 3: CUDA EXECUTION MODEL 67

Introducing the CUDA Execution Model 67

GPU Architecture Overview 68

The Fermi Architecture 71

The Kepler Architecture 73

Profile-Driven Optimization 78

Understanding the Nature of Warp Execution 80

Warps and Thread Blocks 80

Warp Divergence 82

Resource Partitioning 87

Latency Hiding 90

Occupancy 93

Synchronization 97

Scalability 98

Exposing Parallelism 98

Checking Active Warps with nvprof 100

Checking Memory Operations with nvprof 100

Exposing More Parallelism 101

Avoiding Branch Divergence 104

The Parallel Reduction Problem 104

Divergence in Parallel Reduction 106

Improving Divergence in Parallel Reduction 110

Reducing with Interleaved Pairs 112

Unrolling Loops 114

Reducing with Unrolling 115

Reducing with Unrolled Warps 117

Reducing with Complete Unrolling 119

Reducing with Template Functions 120

Dynamic Parallelism 122

Nested Execution 123

Nested Hello World on the GPU 124

Nested Reduction 128

Summary 132

CHAPTER 4: GLOBAL MEMORY 135

Introducing the CUDA Memory Model 136

Benefi ts of a Memory Hierarchy 136

CUDA Memory Model 137

Memory Management 145

Memory Allocation and Deallocation 146

Memory Transfer 146

Pinned Memory 148

Zero-Copy Memory 150

Unifi ed Virtual Addressing 156

Unified Memory 157

Memory Access Patterns 158

Aligned and Coalesced Access 158

Global Memory Reads 160

Global Memory Writes 169

Array of Structures versus Structure of Arrays 171

Performance Tuning 176

What Bandwidth Can a Kernel Achieve? 179

Memory Bandwidth 179

Matrix Transpose Problem 180

Matrix Addition with Unified Memory 195

Summary 199

CHAPTER 5: SHARED MEMORY AND CONSTANT MEMO

Kunden Rezensionen

Zu diesem Artikel ist noch keine Rezension vorhanden.
Helfen sie anderen Besuchern und verfassen Sie selbst eine Rezension.

> neue Rezension schreiben

Professional CUDA C Programming

Beschreibung:

Kunden Rezensionen

Kontakt

Ihre Vorteile bei uns

Zahlweisen

Versandarten

Herzlich Willkommen!

Notwendige Cookies

Komfort Cookies

Marketing-/ Tracking-Cookies

Professional CUDA C Programming

Beschreibung:

Kunden Rezensionen