Search This Blog

Thursday, March 27, 2008

Hello World with CUDA

Writing Hello World in CUDA is bit difficult ( CUDA does not support strings)

But the following is a vector addition program which may be a good starting point

Simple code to add two vectors.(Blue colour)

#include "stdio.h"

__global__ void add_arrays_gpu( float *in1, float *in2, float *out, int Ntot)
int idx=blockIdx.x*blockDim.x+threadIdx.x;
if ( idx

int main()
/* pointers to host memory */
float *a, *b, *c;
/* pointers to device memory */
float *a_d, *b_d, *c_d;
int N=18;
int i;

/* Allocate arrays a, b and c on host*/
a = (float*) malloc(N*sizeof(float));
b = (float*) malloc(N*sizeof(float));
c = (float*) malloc(N*sizeof(float));

/* Allocate arrays a_d, b_d and c_d on device*/
cudaMalloc ((void **) &a_d, sizeof(float)*N);
cudaMalloc ((void **) &b_d, sizeof(float)*N);
cudaMalloc ((void **) &c_d, sizeof(float)*N);

/* Initialize arrays a and b */
for (i=0; i
a[i]= (float) i;
b[i]=-(float) i;

/* Copy data from host memory to device memory */
cudaMemcpy(a_d, a, sizeof(float)*N, cudaMemcpyHostToDevice);
cudaMemcpy(b_d, b, sizeof(float)*N, cudaMemcpyHostToDevice);

/* Compute the execution configuration */
int block_size=8;
dim3 dimBlock(block_size);
dim3 dimGrid ( (N/dimBlock.x) + (!(N%dimBlock.x)?0:1) );

/* Add arrays a and b, store result in c */
add_arrays_gpu<<>>(a_d, b_d, c_d, N);

/* Copy data from deveice memory to host memory */
cudaMemcpy(c, c_d, sizeof(float)*N, cudaMemcpyDeviceToHost);

/* Print c */
for (i=0; i
printf(" c[%d]=%f\n",i,c[i]);

/* Free the memory */
free(a); free(b); free(c);
cudaFree(a_d); cudaFree(b_d);cudaFree(c_d)


Running the Code

Copy the code in a file
Compile it with nvcc: nvcc -o add_vector
Run it: ./add_vector

If you don't have a Cuda capable GPU, compile it in emulation mode:
nvcc -deviceemu -o add_vector_emu
Run it: ./add_vector_emu


ajith said...

I am new to the CUDA programming language.
so i just wanted to start with this code.
Firstly, what compiler should be there to compile this program ...
i just copied this code into my system and trying to compile...
but it is giving error.
The error is " nvcc fatal : No input files specified; use option -- help for more information"

Klaus said...

Hello ajith,

Have you installed the SDK?

In order to run cuda programs like this one you must either have a GPU or run it in emulation mode.
But first you should get NVIDIA Cuda from here:

Look for the installation instructions on the site. The process is different for each system.

cheers. :)

ajith said...

Thank you for the prompt reply after long time. But it is not long for me, since i could not install till now.
But i asked some experts in that area, they were telling there are difficulties in installing the CUDA in different OS. I have linux fedora. In this OS, it is very difficult to Install that.
still i do not know the problem
Thank you for the reply
Best Wishes :)

Mihir said...

have you traied the following link

sushant said...

i got the following error on executing the code in device emulation mode:

error: expected an expression at this line
add_arrays_gpu<< >>(a_d, b_d, c_d, N);

Conrad said...

Some of the code was not copied over correctly. Here is link to a working copy:

Cuda said...

Is this parallel programming?

Mihir said...

you can say that.


python cookbook said...

Hi All,

I am very new to cuda programming and I installed cuda driver,sdk,cuda toolkit and visual studio 2008.
I did a lot of search and followed some docs that are online but was unsuccessful in figuring out how to start cuda programming by using visual studio 2008.

Can please please any one can help me if there is any tutorial for this about how to start working cuda with visual studio 2008. If I can solution for how to create a simple "hello world" or "adding two numbers" project and compile it by us both cuda and vs2008 it would be very great.

Can anyone please help me by putting the steps on how I should proceed I am very greatful to them.

Please help me.

Mihir said...

I don't use VS2008 , as i program on linux, but refer the following linnk, if that helps.


ANCHAL said...

where to copy the code to compile it

ANCHAL said...

please help me out i m new to this & is it necessary to have visual studio 2005 or 2008

Ankit said...

Hey python cookbook, Even I am stuck with using CUDA. Even I have downloaded CUDA toolkit and have Visual Studio 2010. Can you tell me if u have solved your problem and guide me. Any help from this forum will be highly appreciated


Mihir said...

You don't need visual studio to compile the cuda program ( Visual studio is just an IDE.. for editing ). what i would recommand is to install the NVCC compiler (CUDA SDK) from the NVIDIA site and run the program from the command line.

(Remember you need to set the path variable to point to NVCC.exe)

If i find time i may put down together a step by step tutorial to get cuda working with Vistual studio.


Anonymous said...

How do we compile CUDA fortran code? I believe that the nvcc is just for the C code. Is PGI Fortran the only compiler available ?

Filipe said...

I tried to compile and got the message:

Visual Studio configuration file 'vsvars32.bat' could not be found for installation at './../../..'

Does anybody know how to proceed in this case?

Anonymous said...

i've installed cuda sdk & toolkit on ubuntu but still dont know how to compile program can u provide me a complete description about where to right programs & how to compile them how to use cuda in emulation mode thanking u

Animesh Saxena said...

I am more of console based programmer, so windows is like an alien system to me. Using terminal if I try to compile I get following message

animesh-saxenas-macbook-pro:smokeParticles animeshsaxena$ nvcc -I /Developer/GPU\ Computing/C/common/inc/
Undefined symbols:
"_main", referenced from:
start in crt1.10.6.o
ld: symbol(s) not found
collect2: ld returned 1 exit status

I am able to run sample CUDA programs so CUDA driver is configured for sure!
Any suggestions what I am missing?

BarnyardBill said...

I am going to repost an earlier post in case you looked over it.

"Some of the code was not copied over correctly. Here is link to a working copy:"

it works. I am a beginner so I can confirm that it worked :)

galmeida said...

Extremely helpful. Your strength is exactly what is most needed and lacking on the net: Objectivity.

The example was simple and complete enough for an introduction and the execution instructions were as simple as possible.

Most users on the net are desperate for attention and always write far more than needed, making useless tutorials. Most content on the net is totally useless actually, but certainly not yours!

Mihir said...

Thanks for the complements.

Li Hengfeng said...

Hi, this link seems doesn't work now. Where I can find the copy?


Anonymous said...

Couple errors in the code:

for loop should read:
for (i=0; i>>(a_d, b_d, c_d, N);

And you need a ';' at the end of the last cudaFree(c_d);

I think that is it.