Monday, 26 August 2013

calling global function inside a kernel CUDA

calling global function inside a kernel CUDA

I'm trying to write a CUDA kernel function that contains Matrix
multiplication. like:
__device__ Matrix_Multi(Matrix A,Matrix B,Matrix C);
__global__ void foo(type para){
....
Matrix_Multi(Matrix A,Matrix B,Matrix C);
....
}
I want to accelerate the matrix multiplication operation. I have two choices.
first, using Cublas library. second, write a kernel for matrix
multiplication and call it inside foo().
I failed in both cases. Can anyone help?

No comments:

Post a Comment