Friday, April 25, 2008

2D Sine wave on GPU +CUDA Programming



(x_amp =1, y_amp=1, x_wave=1, y_wave=1)

Worked out 2d arrays to produce sine wave using CUDA programming on GPU.

Gnuplot is used for plotting the image.

The time saving for computation is high (haven't got a numerical value yet)..




(x_amp =0.5, y_amp=0.5, x_wave=2, y_wave=2)

-- Mihir Patel.


Gnu plot + splot +3d data


The application program of choice is GNUplot. To demonstrate how to use gnuplot,
we will give an example.


The example will be to create a three dimensional plot of the Breit-Wigner
T-matrix:




where ER and Gamma are constants; ImE is the imaginary part
of the energy; Re E is the real part of the energy, and Im T is the imaginary
part of the T matrix. For this example, we will set ER = 180 MeV, Gamma = 120
MeV, Im E is the y axis, Re E is the x axis and Im T is the z axis.


Here is the C code I used to generate the data file in
the proper gnuplot format. The Data file consists of Z values calculated as a
function of x and y.

Here is the data file that I created with the above
code.
The most difficult area of gnuplot is producing a proper data file so GNUplot
can read it. We recommend producing a data file like this:




This file actually represents the coordinates of a matrix like this:

(This is a sample of a data file with only nine data points)



(x1 y1)

(x1 y2)

(x1 y3)



(x2 y1)

(x2 y2)

(x2 y3)



(x3 y1)

(x3 y2)

(x3 y3)



The blanklines represent the end of a row.


To make a data file to plot, copy and compile the above code or write your own.
To use the above code, you will need to redirect the standard output to a
file:


prompt> ./a.out > data


This will create a file called "data" in your current directory.


To start GNUplot, simply type:


prompt> gnuplot


To make a 3D surface plot of a file called "data" type:


gnuplot> splot 'data'


You should now have a plot which looks something like the plot below:





These plots usually don't look that good, it is better to use the lines
option.



gnuplot> splot 'data' with lines



This will produce a plot like this:






Plots like these still look a little confusing since you can't tell which part
is behind another. It is usually best to use the hidden 3d option.



gnuplot> set hidden3d


gnuplot> splot 'data' with lines



The plot should now look like the plot below:






Now we can't see the bottom spike. To rotate the plot, we can use the set view
option:
gnuplot> set view 70,10



gnuplot> splot 'data' with lines



Now you will have a plot which looks like the plot below:






This sets the viewing angle to 70 degrees in the x direction and 10 degrees in
the z direction relative to the virtual coordinate system. For explanation of
this coordinate system, see the bottom of the tutorial.



Next, we can change the scaling on the z axis to elimanate the wasted space
below the plot:



gnuplot> set zrange [-25:150]



This option forces the specified axis to be in the specified range.
Here is the resultant plot:






Now we are ready to label the axis and give the plot a title.


gnuplot>set xlabel "Re E"

gnuplot>set ylabel "Im E"

gnuplot>set zlabel "Im T"

gnuplot>set title "Breit-Wigner T-Matrix"


gnuplot>splot 'data' with lines



Here is the plot:






For this angle, the y axis tic marks seem too crowded so we would like to space
them 10 tics apart:



gnuplot> set ytics 0,10


gnuplot>splot 'data' with lines



Here is the plot:






Another option is to add contour lines to your plot. You can project them onto
the bottom of the plot and/or place them on the surface plot itself. The command
is:(NOTE: This plot will not yeild contours for some reason)



gnuplot> set contour base

or

gnuplot> set contour surface

or

gnuplot> set contour both

Tuesday, April 8, 2008

CUDA Programming Part 3

Extended Hello World to execute it on N number of blocks..



======================

Main Line code

======================

/* HellowWorld Project
* This project demonstrates the basics on how to setup
* an example GPU Copmuting application.
*
* THis file contains the CPU (host) code.
*/

// Host defines
#define NUM_BLOCKS 32
#define NUM_THREADS 32
#define STR_SIZE 50

// Includes
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>

// CUDA includes
#include <cutil.h> // CUDA Utility Tools

// GPU Kernels declarations - declare as inlcude
#include <HelloWorld_kernel.cu>

//////////////////////
// Program main
//////////////////////
int main( int argc, char** argv)
{

// Host variables
int i,nBytes;
unsigned int timer;
unsigned int num_threads,num_blocks;
char *cpu_odata;
char *string;

// GPU variables
char *gpu_odata;
int str_size;

/////////////////////////////////////////////////////////////////////
// This routine gets the number of GPUs existing in the computer
// For each GPU (device) found, it checks to see if there is a GPU
// that supports CUDA. If no GPU that supports CUDA is found,
// the routine wll exit
/////////////////////////////////////////////////////////////////////
CUT_DEVICE_INIT();

/////////////////////////////////////////////////////////////////////
// Create and start a timer called "timer"
// alls to create ans start times are enveloped in the CUT_SAFE_CALL
// This CUDA Utility Tool checks for errors upon return.
// If an error is found, it prints out and error message, file name,
// and line number in file where the error can be found
/////////////////////////////////////////////////////////////////////
timer = 0;
CUT_SAFE_CALL(cutCreateTimer(&timer));
CUT_SAFE_CALL(cutStartTimer(timer));

// Initialize CPU variables and allocate required memory
num_threads = (unsigned int) NUM_THREADS;
num_blocks = (unsigned int) NUM_BLOCKS;
nBytes = num_blocks*num_threads*STR_SIZE*sizeof(char);

// Allocate and initialize CPU output vector
string = (char *) malloc(STR_SIZE);
if(!string) {
printf("Cannot allocate string memory on CPU\n");
exit(-1);
}
cpu_odata = (char *) malloc(nBytes);
if(!cpu_odata) {
printf("Cannot allocate cpu_odata memory on CPU\n");
exit(-1);
}

// Allocate GPU (device) memory and variables
str_size = (int) STR_SIZE;
CUDA_SAFE_CALL(cudaMalloc( (void**) &gpu_odata, nBytes));

// Setup kernel execution parameters
dim3 grid(num_blocks,1,1);
dim3 threads(num_threads,1,1);

// Execute the kernel on the GPU
HelloWorld_kernel<<< grid, threads >>>(str_size,num_threads, gpu_odata);
// HelloWorld_kernel<<<num_blocks,num_threads>>>(str_size, gpu_odata);

// Check if kernel execution generated and error
CUT_CHECK_ERROR("Kernel execution failed");

// Copy result from GPU to CPU
CUDA_SAFE_CALL(cudaMemcpy(cpu_odata,gpu_odata,nBytes,cudaMemcpyDeviceToHost));

// Stop the timer
CUT_SAFE_CALL(cutStopTimer(timer));
printf( "Processing time: %f (ms)\n", cutGetTimerValue(timer));

// Delete the timer
CUT_SAFE_CALL(cutDeleteTimer(timer));

// Output results is same as the expected solution
for(i=0;i<num_threads*num_blocks;i++) {
strncpy(string,&cpu_odata[i*STR_SIZE],STR_SIZE);
printf("From thread %d: %s\n",i,string);/* HellowWorld Project
* This project demonstrates the basics on how to setup
* an example GPU Copmuting application.
*
* THis file contains the CPU (host) code.
*/

// Host defines
#define NUM_BLOCKS 32
#define NUM_THREADS 32
#define STR_SIZE 50

// Includes
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>

// CUDA includes
#include <cutil.h> // CUDA Utility Tools

// GPU Kernels declarations - declare as inlcude
#include <HelloWorld_kernel.cu>

//////////////////////
// Program main
//////////////////////
int main( int argc, char** argv)
{

// Host variables
int i,nBytes;
unsigned int timer;
unsigned int num_threads,num_blocks;
char *cpu_odata;
char *string;

// GPU variables
char *gpu_odata;
int str_size;

/////////////////////////////////////////////////////////////////////
// This routine gets the number of GPUs existing in the computer
// For each GPU (device) found, it checks to see if there is a GPU
// that supports CUDA. If no GPU that supports CUDA is found,
// the routine wll exit
/////////////////////////////////////////////////////////////////////
CUT_DEVICE_INIT();

/////////////////////////////////////////////////////////////////////
// Create and start a timer called "timer"
// alls to create ans start times are enveloped in the CUT_SAFE_CALL
// This CUDA Utility Tool checks for errors upon return.
// If an error is found, it prints out and error message, file name,
// and line number in file where the error can be found
/////////////////////////////////////////////////////////////////////
timer = 0;
CUT_SAFE_CALL(cutCreateTimer(&timer));
CUT_SAFE_CALL(cutStartTimer(timer));

// Initialize CPU variables and allocate required memory
num_threads = (unsigned int) NUM_THREADS;
num_blocks = (unsigned int) NUM_BLOCKS;
nBytes = num_blocks*num_threads*STR_SIZE*sizeof(char);

// Allocate and initialize CPU output vector
string = (char *) malloc(STR_SIZE);
if(!string) {
printf("Cannot allocate string memory on CPU\n");
exit(-1);
}
cpu_odata = (char *) malloc(nBytes);
if(!cpu_odata) {
printf("Cannot allocate cpu_odata memory on CPU\n");
exit(-1);
}

// Allocate GPU (device) memory and variables
str_size = (int) STR_SIZE;
CUDA_SAFE_CALL(cudaMalloc( (void**) &gpu_odata, nBytes));

// Setup kernel execution parameters
dim3 grid(num_blocks,1,1);
dim3 threads(num_threads,1,1);

// Execute the kernel on the GPU
HelloWorld_kernel<<< grid, threads >>>(str_size,num_threads, gpu_odata);
// HelloWorld_kernel<<<num_blocks,num_threads>>>(str_size, gpu_odata);

// Check if kernel execution generated and error
CUT_CHECK_ERROR("Kernel execution failed");

// Copy result from GPU to CPU
CUDA_SAFE_CALL(cudaMemcpy(cpu_odata,gpu_odata,nBytes,cudaMemcpyDeviceToHost));

// Stop the timer
CUT_SAFE_CALL(cutStopTimer(timer));
printf( "Processing time: %f (ms)\n", cutGetTimerValue(timer));

// Delete the timer
CUT_SAFE_CALL(cutDeleteTimer(timer));

// Output results is same as the expected solution
for(i=0;i<num_threads*num_blocks;i++) {
strncpy(string,&cpu_odata[i*STR_SIZE],STR_SIZE);
printf("From thread %d: %s\n",i,string);
}


//////////////////////////////////////////
// All done - clean up and exit
//////////////////////////////////////////
// Free up CPU memory
free(cpu_odata);

// Free up GPU memory
CUDA_SAFE_CALL(cudaFree(gpu_odata));

// Use CUDA Utility Tool to exit cleanly
CUT_EXIT(argc, argv);
}
}


//////////////////////////////////////////
// All done - clean up and exit
//////////////////////////////////////////
// Free up CPU memory
free(cpu_odata);

// Free up GPU memory
CUDA_SAFE_CALL(cudaFree(gpu_odata));

// Use CUDA Utility Tool to exit cleanly
CUT_EXIT(argc, argv);
}







======================
Kernel
======================


#ifndef _HELLOWORLD_KERNEL_H_
#define _HELLOWORLD_KERNEL_H_

#include <stdio.h>

///////////////////////////////////////////////////////////
// Simple Hello World kernel
// @param gpu_odata output data in global memory
///////////////////////////////////////////////////////////
__global__ void HelloWorld_kernel(int size,int num_threads, char *gpu_odata)
{
int i,k,x,n,last_num;
int idx,not_done;

// Access thread id and block id
const unsigned int thread_idx = threadIdx.x;
const unsigned int block_idx = blockIdx.x;

// Write data to global memory
idx = 0;
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'H';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'e';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'l';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'l';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'o';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = ' ';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'W';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'o';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'r';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'l';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'd';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = ' ';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'F';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'r';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'o';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'm';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = ' ';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'T';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'h';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'r';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'e';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'a';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'd';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = ' ';

// Convert thread id to chars
// Determine number of places in thread idx
not_done = 1;
k = 10;
n = 1;
while(not_done == 1) {
x = thread_idx/k;
if (x>0) {
k = k*10;
n +=1;
}
else
not_done = 0;
}

// Parse out the thread index and convert to chars
k = k/10;
last_num = 0;
for(i=n;i>0;i--) {
x = thread_idx/k-last_num;
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = '0' + x;
last_num = (thread_idx/k)*10;
k = k/10;
}

gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = ' ';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'i';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'n';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = ' ';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'B';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'l';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'o';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'c';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = 'k';
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = ' ';

// Convert block id to chars
// Determine number of places in thread idx
not_done = 1;
k = 10;
n = 1;
while(not_done == 1) {
x = block_idx/k;
if (x>0) {
k = k*10;
n +=1;
}
else
not_done = 0;
}

// Parse out the block index and convert to chars
k = k/10;
last_num = 0;
for(i=n;i>0;i--) {
x = block_idx/k-last_num;
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = '0' + x;
last_num = (block_idx/k)*10;
k = k/10;
}

// Fill out rest of string
for(i=idx;i<size;i++)
gpu_odata[block_idx*num_threads*size + thread_idx*size + idx++] = ' ';
}

#endif

http://developer.nvidia.com/dev_content/cg/cg_examples/images/sine_wave_perturbation_ogl.jpg

Second Assignment is to generate a 2d sine wave for the RK4 method.

Friday, April 4, 2008

Take Launchy beyond application launching


Most people know Launchy as a simple and lightning-fast application launcher for Windows, but it's actually capable of much more than that. Using Launchy, you can append text to files, schedule appointments, add to your to-do list, set reminders, and more—it just takes a little setting up. Today I'll detail how I use Launchy for more than just app launching.

The real brains behind most of what I describe below involves using Windows batch files in conjunction with cURL (an open source command line tool for transferring data with URL syntax) and Twitter.* So if you don't already have it, go download cURL (find the Windows section), then extract the curl.exe file to C:\WINDOWS.

Now download this zipped folder. Inside you'll see several batch files I've prepared for you. Basically, we're going to use these batch files like plugins by running terminal commands with Launchy. First, extract them to

C:\Program Files\Launchy\Utilities. Then invoke Launchy, right-click the Launchy command window and select Directories. You should already have the Utilities\ folder included in your list of Directories to scan, but you need to add the .bat filetype to the files you want Launchy to scan for in that folder, so under Directory Information -> File Types, type in .bat and click Add (see screenshot above).

Now we'll walk through how to set up the Launchy batch tools one-by-one below.

Remember the Milk with Launchy

  1. Open the rtm.bat file in your favorite text editor.



  2. On the third line, replace username:password with your Twitter username and password and save the file.*
  3. Now set up RTM to work with Twitter here (remember to add 'rtm' as a friend in Twitter.

Any time you want to add an item to RTM on-the-fly, just invoke Launchy, type rtm, hit Tab, and then type the task you want to add. See the command list here for how you can get specific with RTM syntax.

Post to Google Calendar with Launchy

gcal-launchy.png

We're going to follow most of the same steps as we did with the rtm.bat file:

  1. Open the gcal.bat file in your favorite text editor.
  2. On the third line, replace username:password with your Twitter username and password and save the file.*
  3. Now set up Gcal to work with Twitter with Twittercal (remember to add 'gcal' as a friend in Twitter.

You can add anything to Google Calendar using Launchy by typing gcal, hitting Tab, and typing in your event using the natural language of Gcal's quick add (e.g., lunch 1pm tomorrow at Sally's).


Set SMS reminders with Launchy


Again we're using a batch file and Twitter. This time it's a handy Twitter app called timer that lets you set SMS reminders. To set it up:

  1. Open the timer.bat file in your favorite text editor.
  2. On the third line, replace username:password with your Twitter username and password and save the file.*
  3. Now just add timer as a friend on Twitter and you're ready to go.

Any time you want to set up a timed SMS reminder for yourself, just invoke Launchy, type timer,
hit Tab, and enter something like 'call mom in 45.' In 45 minutes
you'll get an SMS reminder on the phone you've set up with Twitter.


Append text to files with Launchy


append-launchy.png

Being able to append text to a plain text file on-the-fly with Quicksilver gets me through my day, and now you can do the same with Launchy. Appending text to a file doesn't require Twitter (ahh....), but it does use another batch file. This method comes from the Making the Ordinary weblog,and I've included it in the download above. To use it, you need to have a place you want to capture a list in text. The default in the add.bat file is C:\todo\todo.txt.
If you want to keep your list elsewhere, just change the first (and only) line of the batch file to match the path to your file.

Whenever you want to add a line to your text file, type add, hit Tab, and type your to-do. When you hit enter, the text will automagically be appended to your todo.txt file.


Twit with Launchy

Finally, if you're actually a Twitter user, the twit.bat file should take care of you. Just open it up and change the username and password to match your own like above. Then you can update your Twitter status by invoking Launchy, typing twit, entering your status, and hitting Enter.


More uses for cURL and batch?


There's a lot of room for improvement for how these batch files work (for example, your Twitter password is now being kept in an unencrypted batch file), but there's also a lot of room for expanding on this idea with other web applications beyond Twitter that you could interact with using the command line and cURL via Launchy. I'd love to hear your ideas and suggestions in the comments.

For more on advanced Launchy usage, see how I tweak Launchy to index my music collection, search my Firefox keyword bookmarks, navigate my file system, and more.

Huge thanks to reader Don Whitford for sending in the cURL innovations and reader Trent for his ideas!

* That's right, many of these "plug-ins" use Twitter. If you don't already have a Twitter account, you'll need to grab one now. Twitter is a beloved social app to some and a hated time waster to others, but it's neither here. It's merely a means to an end. The benefit to setting up Launchy to work with Twitter in this way is that any of these Twitter apps are also accessible on-the-go from your cell phone, providing you with a universal access point for a local and
mobile command line of sorts.

Ref: Adam Pash is a senior editor for Lifehacker who loves the rebirth of the command line. His special feature Hack Attack appears every Tuesday on Lifehacker. Subscribe to the Hack Attack RSS feed to get new installments in your newsreader.

Catch all those spelling errors +eclipse

spelling errors. I think we all make them - I know I do anyway.
After typing these tips on a near daily basis I've begun to recognize
that no matter how good you think you are at spelling, stupid mistakes
show up, and a lot of times you get a bad spelling habit stuck in your
head and continually put too many c's in necessary or something. I
would be surprised if a spelling error didn't find its way in to this
very tip, as I'm usually too lazy to put a lot of effort into spell
checking these :).


Spelling errors in documentation can mean the difference between life and death. Really?! No... not really - but spelling
is
important. Bugs are opened in large frameworks when spelling and
grammatical errors are in the documentation. Really, it happens! In
addition, people immediately take you more seriously when you spend the
time to ensure that your spelling is accurate. Finally, it is
definitely possible that spelling errors can convey a different meaning
than intended. (Deprecate and Depreciate anyone?)

Thankfully, Eclipse has a little hidden gem that makes quick
work of spelling errors. The spelling support in Eclipse has actually
been around for a while now (if I'm not mistaken, it was enabled in
3.0), and recently (3.1 M4) it was enabled in the properties file
editor in addition to the already supported Java editor. So, how do you
enable it?

Well, the first step is to find the preference for it. That's a
relatively simple process as I already know, and I am going to tell you
;) - just go to
Window->Preferences->Editors[+]->Text Editors[+]->Spelling
and check
Enable Spelling
.




Oops, wait a minute... it has a section to provide a 'User defined
dictionary'. Is this mandatory? What format does it need to be in? Do
we have to type all the words by hand? This could take a while! For
spell checking to work, it is currently mandatory for you to provide a
dictionary. Thankfully it's not that hard to find a word-list Eclipse
can use. Eclipse supports a fairly standard one-word-per-line format
for the 'dictionary' file, so we just have to find one of those. The
best branching off point for that is probably Kevin's Word List on Sourceforge.net
. Seem too hard? Ok, fair enough. I've compiled my own word list
off of an amalgum of lists (I honestly can't remember which ones,
otherwise I would be sure to give credit where credit is due), and I
have uploaded it here
. Fair warning! This file is substantial in size - so substantial
that I couldn't attach it to this message in the conventional Javalobby
'Attach File' way (around 1.5 megabytes).

Ok, so now let's enable our spell-checking. I like to drop the
dictionary file in the folder with my Eclipse installation, and then
reference it in the preferences, as seen here:



Then all you have to do is use it. Spell checking shows up in
one of two ways - first as squiggly warnings in your documentation and
strings, and second as a preemptive auto-complete for content assist
(assuming you have enabled that option in the advanced section seen in
the screenshot above). Here is what the spelling error marker looks
like (and what happens when I press Ctrl+1 with my cursor adjacent to
the error):




Here is what it looks like using the auto-complete feature (Ctrl+Space):



Unfortunately, spelling support still needs a little work in
some areas; it is no panacea. So, what would I like to see in the
future? For one thing, there is no way yet to control what elements are
spell-checked, and which ones aren't. What I mean by that is whether
javadocs, regular comments, string literals, and/or type/method/field
names are spell-checked. Currently all documentation and string
literals are checked, but that's it. Frankly, my string literals need
checking less often than my types and fields and methods - and in my
string literals, it is more often the spell-checker would be wrong. In
addition, the algorithm used for finding words of a closest match
doesn't always seem the most appropriate - when using the word-list for
auto-complete, if I were to type 'spell' and then press Ctrl+Space, I
would like 'spelling' come up in the list before 'spew' - since that is
the main benefit of an auto-complete feature - completing the word!

Error: tag OUTPUT_DIRECTORY: Output directory +Doxygen

The problem here is due to the linux file style which does not allow you to have directory name with spaces.

if you put you out put directory to c:\test it should work.

-Mihir

Error opening map file +doxygen

Doxygen having problems running DOT


Just to rule out too long name issues, can you retry with SHORT_NAMES set to YES in the config file?

- Mihir Patel

Thursday, April 3, 2008

First Hello World program from CUDA

HELLO WORLD (CUDA)


Step 1:

cp -r template HelloWorld

NOTE:- CUDA documentation is under common\ cutil_readme.txt

step 2:

compile using make (for testing)

Step 3:

Change the template with (HelloWorld) in make file

Open the HelloWorld.cu and change the include template_kernel.cu to

// includes, kernels
#include <HelloWorld_kernel.cu>

NOTES:-

CUDA_DEVICE_INIT all the cuda initialization's under here

CUDA_SAFE_CALL tells how to call ( CUDA Routines)

Limitation is up to 32 threads


(HelloWorld_kernel.cu) Kernel is going to execute on GPU.

(HelloWorld.cu) and other will run on CPU.


Step 4:

Compile using make (should come up without any error)

NOTE : - in this default example we will use the global memory area

Using cuda malloc it makes sure the data transfer is optimized


Step 5:

remove the reference code from the HelloWorld.cu file

NOTE:

* h_odata - CPU data
* d_odata - GPU data

------------------------------------------------
Final Working program
------------------------------------------------

==========================
HelloWorld_Kernel
==========================

/*
* Copyright 2008, Karen Hains, UWA (University of Western Australia).
* All rights reserved.
*
* NOTICE TO USER:
*
* This source code is subject to NVIDIA ownership rights under U.S. and
* international Copyright laws. Users and possessors of this source code
* are hereby granted a nonexclusive, royalty-free license to use this code
* in individual and commercial software.
*
* WE MAKES NO REPRESENTATION ABOUT THE SUITABILITY OF THIS SOURCE
* CODE FOR ANY PURPOSE. IT IS PROVIDED "AS IS" WITHOUT EXPRESS OR
* IMPLIED WARRANTY OF ANY KIND.
*/

#ifndef _HELLOWORLD_KERNEL_H_
#define _HELLOWORLD_KERNEL_H_

#include <stdio.h>


///////////////////////////////////////////////////////////
// Simple Hello World kernel
// @param gpu_odata output data in global memory
///////////////////////////////////////////////////////////
__global__ void HelloWorld_kernel(int size, char *gpu_odata)
{
int i,k,x,n,last_num;
int idx,not_done;

// Access thread id and block id
const unsigned int thread_idx = threadIdx.x;
const unsigned int block_idx = blockIdx.x;

// Write data to global memory
idx = 0;
gpu_odata[thread_idx*size + idx++] = 'H';
gpu_odata[thread_idx*size + idx++] = 'e';
gpu_odata[thread_idx*size + idx++] = 'l';
gpu_odata[thread_idx*size + idx++] = 'l';
gpu_odata[thread_idx*size + idx++] = 'o';
gpu_odata[thread_idx*size + idx++] = ' ';
gpu_odata[thread_idx*size + idx++] = 'W';
gpu_odata[thread_idx*size + idx++] = 'o';
gpu_odata[thread_idx*size + idx++] = 'r';
gpu_odata[thread_idx*size + idx++] = 'l';
gpu_odata[thread_idx*size + idx++] = 'd';
gpu_odata[thread_idx*size + idx++] = ' ';
gpu_odata[thread_idx*size + idx++] = 'F';
gpu_odata[thread_idx*size + idx++] = 'r';
gpu_odata[thread_idx*size + idx++] = 'o';
gpu_odata[thread_idx*size + idx++] = 'm';
gpu_odata[thread_idx*size + idx++] = ' ';
gpu_odata[thread_idx*size + idx++] = 'T';
gpu_odata[thread_idx*size + idx++] = 'h';
gpu_odata[thread_idx*size + idx++] = 'r';
gpu_odata[thread_idx*size + idx++] = 'e';
gpu_odata[thread_idx*size + idx++] = 'a';
gpu_odata[thread_idx*size + idx++] = 'd';
gpu_odata[thread_idx*size + idx++] = ' ';

// Convert thread id to chars
// Determine number of places in thread idx
not_done = 1;
k = 10;
n = 1;
while(not_done == 1) {
x = thread_idx/k;
if (x>0) {
k = k*10;
n +=1;
}
else
not_done = 0;
}

// Parse out the thread index and convert to chars
k = k/10;
last_num = 0;
for(i=n;i>0;i--) {
x = thread_idx/k-last_num;
gpu_odata[thread_idx*size + idx++] = '0' + x;
last_num = (thread_idx/k)*10;
k = k/10;
}

gpu_odata[thread_idx*size + idx++] = ' ';
gpu_odata[thread_idx*size + idx++] = 'i';
gpu_odata[thread_idx*size + idx++] = 'n';
gpu_odata[thread_idx*size + idx++] = ' ';
gpu_odata[thread_idx*size + idx++] = 'B';
gpu_odata[thread_idx*size + idx++] = 'l';
gpu_odata[thread_idx*size + idx++] = 'o';
gpu_odata[thread_idx*size + idx++] = 'c';
gpu_odata[thread_idx*size + idx++] = 'k';
gpu_odata[thread_idx*size + idx++] = ' ';

// Convert block id to chars
// Determine number of places in thread idx
not_done = 1;
k = 10;
n = 1;
while(not_done == 1) {
x = block_idx/k;
if (x>0) {
k = k*10;
n +=1;
}
else
not_done = 0;
}

// Parse out the block index and convert to chars
k = k/10;
last_num = 0;
for(i=n;i>0;i--) {
x = block_idx/k-last_num;
gpu_odata[thread_idx*size + idx++] = '0' + x;
last_num = (block_idx/k)*10;
k = k/10;
}

// Fill out rest of string
for(i=idx;i<size;i++)
gpu_odata[thread_idx*size + idx++] = ' ';
}

#endif // #ifndef _HELLOWORLD_KERNEL_H_


==========================

HelloWorld.cu

==========================

/*

* Copyright 2008, Karen Hains, UWA . All rights reserved.

*

* NOTICE TO USER:

*

* This source code is subject to NVIDIA ownership rights under U.S. and

* international Copyright laws. Users and possessors of this source code

* are hereby granted a nonexclusive, royalty-free license to use this code

* in individual and commercial software.

*

* WE MAKES NO REPRESENTATION ABOUT THE SUITABILITY OF THIS SOURCE

* CODE FOR ANY PURPOSE. IT IS PROVIDED "AS IS" WITHOUT EXPRESS OR

* IMPLIED WARRANTY OF ANY KIND.

*/

/* HellowWorld Project
* This project demonstrates the basics on how to setup
* an example GPU Copmuting application.
*
* THis file contains the CPU (host) code.
*/

// Host defines
#define NUM_THREADS 32
#define STR_SIZE 50

// Includes
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>

// CUDA includes
#include <cutil.h> // CUDA Utility Tools

// GPU Kernels declarations - declare as inlcude
#include <HelloWorld_kernel.cu>

//////////////////////
// Program main
//////////////////////
int main( int argc, char** argv)
{

// Host variables
int i,nBytes;
unsigned int timer;
unsigned int num_threads;
char *cpu_odata;
char *string;

// GPU variables
char *gpu_odata;
int str_size;

/////////////////////////////////////////////////////////////////////
// This routine gets the number of GPUs existing in the computer
// For each GPU (device) found, it checks to see if there is a GPU
// that supports CUDA. If no GPU that supports CUDA is found,
// the routine wll exit
/////////////////////////////////////////////////////////////////////
CUT_DEVICE_INIT();

/////////////////////////////////////////////////////////////////////
// Create and start a timer called "timer"
// alls to create ans start times are enveloped in the CUT_SAFE_CALL
// This CUDA Utility Tool checks for errors upon return.
// If an error is found, it prints out and error message, file name,
// and line number in file where the error can be found
/////////////////////////////////////////////////////////////////////
timer = 0;
CUT_SAFE_CALL(cutCreateTimer(&timer));
CUT_SAFE_CALL(cutStartTimer(timer));

// Initialize CPU variables and allocate required memory
num_threads = (unsigned int) NUM_THREADS;
nBytes = num_threads*STR_SIZE*sizeof(char);

// Allocate and initialize CPU output vector
string = (char *) malloc(STR_SIZE);
if(!string) {
printf("Cannot allocate string memory on CPU\n");
exit(-1);
}
cpu_odata = (char *) malloc(nBytes);
if(!cpu_odata) {
printf("Cannot allocate cpu_odata memory on CPU\n");
exit(-1);
}

// Allocate GPU (device) memory and variables
str_size = (int) STR_SIZE;
CUDA_SAFE_CALL(cudaMalloc( (void**) &gpu_odata, nBytes));

// Setup kernel execution parameters
dim3 grid(1,1,1);
dim3 threads(num_threads,1,1);

// Execute the kernel on the GPU
HelloWorld_kernel<<< grid, threads >>>(str_size, gpu_odata);

// Check if kernel execution generated and error
CUT_CHECK_ERROR("Kernel execution failed");

// Copy result from GPU to CPU
CUDA_SAFE_CALL(cudaMemcpy(cpu_odata,gpu_odata,nBytes,cudaMemcpyDeviceToHost));

// Stop the timer
CUT_SAFE_CALL(cutStopTimer(timer));
printf( "Processing time: %f (ms)\n", cutGetTimerValue(timer));

// Delete the timer
CUT_SAFE_CALL(cutDeleteTimer(timer));

// Output results is same as the expected solution
for(i=0;i<num_threads;i++) {
strncpy(string,&cpu_odata[i*STR_SIZE],STR_SIZE);
printf("From thread %d: %s\n",i,string);
}


//////////////////////////////////////////
// All done - clean up and exit
//////////////////////////////////////////
// Free up CPU memory
free(cpu_odata);

// Free up GPU memory
CUDA_SAFE_CALL(cudaFree(gpu_odata));

// Use CUDA Utility Tool to exit cleanly
CUT_EXIT(argc, argv);
}

==========================
Makefile

==========================

################################################################################
#
# Build script for project
#
################################################################################

# Add source files here
EXECUTABLE := HelloWorld
# CUDA source files (compiled with cudacc)
CUFILES := HelloWorld.cu
# CUDA dependency files
CU_DEPS := \
HelloWorld_kernel.cu \

# C/C++ source files (compiled with gcc / c++)
CCFILES := \



################################################################################
# Rules and targets

include ../../common/common.mk


Ref : Western Australian Super computer Program, University of Western Australia , 2008, April, 1- 3, 12-2 pm.