currybab's blog

pmpp lecture 04 gpu architecture 정리

source: Lecture 04 - GPU Architecture

GPU Architecture

thread를 어떻게 GPU에서 실행시키는가?

Synchronization

Scheduling Consideration

Transparent Scalability (투명한 확장성)

Thread Scheduling

Warps

왜 SIMD를 사용하는가?

Latency Hiding (지연시간 숨기기)

Occupancy (점유율)

Occupancy Constraints

Querying Available Resources

    cudaError_t cudaGetDeviceProperties(cudaDeviceProp* prop, int device);
    cudaDeviceProp prop;
    cudaGetDeviceProperties(&prop, 0);
    printf("Max threads per block: %d\n", prop.maxThreadsPerBlock);
    printf("Max threads per multi processor: %d\n", prop.maxThreadsPerMultiProcessor);

#blog #cuda #gpu #pmpp