CUDA - Simple particle System CUDA - Simple particle System CUDA - Simple particle System CUDA - Simple particle System CUDA - Simple particle System CUDA - Simple particle System

PARTICLE SYSTEM IN CUDA.


This project showcases Nvidia CUDA - Parallel computing programming model implemented on the GPUs. The simple particle emitter is taken (Provided by Richard Davison, Newcastle University) and fully implemented in CUDA.

Hence, where in the CPU, the maximum number particles which could run at normal real time simulation (FPS-60) would be in the range of 100-200.
With the help of CUDA, here a particle system with 500000 (half million) particle is created with a comfortable frame rate of 60. This is a very basic demonstration with could be further expanded.

Particle Emitter system in Nvidia CUDA.

Structure of the Program:

I use the particle emitter provided by Richard Davison, which primarily is a list of mesh with a texture which are reused again and again. I have reimplemented it to work in cuda. I define the cuda particle manager function definition as extern so that it could be notified to the cuda compiler to be aware of them.

extern "C" {
	struct cudaParticle;
	/*Constructor & Destructor*/
	void cudaPS_initPS(int size,GLuint in_vbo,GLuint in_cbo);
	void cudaPS_destoryPS();
	/*Cuda Function*/
	int	 cudaPS_update(float msec,Vector3 pos);
	/*Final Draw Function*/
	void cudaPS_bufferData(Vector3* vertices,Vector4* colours);
}

                        

Particle System Model:

The particle emitter is designed to create a single mesh for particles, Each particle is a white dot, which has an alpha fade on it, I define some defaults for the some attributed of the emitter namely Rate, Lifetime, Size, speed , position etc. and assign a texture to it while loading the mesh. Position, colour and direction is stored in a VBO (Vertex Buffer Object) which CUDA and OpenGL will use for calculations and update. A shared buffer is used hence it’s quite fast.The following step are taken :

  1. Position is calculated.
  2. Position is rebuffered.
  3. Mesh is drawn every cycle.

Basic Outline of the Program:

The following step are taken to run the system in CUDA.

  1. Init
  2. void cudaPS_initPS(int size,GLuint in_vbo,GLuint in_cbo){
    		//Init, reserve certain amount of particle attribute
    		d_particle.reserve(size);
    		h_particle.reserve(size);
    		d_maxParticle = size;
    
    		//Setup Buffer Object
    		cudaPS_setupAllBuffers(in_vbo,in_cbo);
    		cudaPS_initBuffer();
    		init_Variable();
    	}
                                
  3. Update
  4.   	//Kernel Function
    	void cudaPS_update(float msec,Vector3 pos){
    		//Bind Buffer to Write
    		cudaPS_bindAllBuffers();
    
    		//Core Particle Update
    		calLaunchThread(num_block,num_thread,d_maxParticle);
    		particleUpdate <<<num_block,num_thread>>>(cudaPs_CboPtr,cudaPs_VboPtr,particlePtr,d_maxParticle,msec,particleSpeed,6000,pos);		
    		cudaThreadSynchronize();
    
    		cudaPS_releaseAllBuffers();	
    
    	}
                            
    • Bind all buffers.
    • __host__ void cudaPS_bindAllBuffers(){
      	cudaPS_bindBuffer((void **)&cudaPs_VboPtr,&cudaPs_VboResource);
      	cudaPS_bindBuffer((void **)&cudaPs_CboPtr,&cudaPs_CboResource);
      	size_t num_bytes;
      	cudaGraphicsResourceGetMappedPointer((void **)&cudaPs_VboPtr, &num_bytes, cudaPs_VboResource);
      	cudaGraphicsResourceGetMappedPointer((void **)&cudaPs_CboPtr, &num_bytes, cudaPs_CboResource);
      }
                                              
    • Launch threads
    • Release Buffers
    • __host__ void cudaPS_releaseAllBuffers(){
          cudaPS_releaseBuffer(&cudaPs_VboResource);
      	cudaPS_releaseBuffer(&cudaPs_CboResource);
      }
                                          
  5. Destroy
  6. __host__ void cudaPS_unregisterAllBuffers(){
    	cudaGraphicsUnregisterResource(cudaPs_VboResource);
    	cudaGraphicsUnregisterResource(cudaPs_CboResource);
    }
                                        

Thread Configuration:

This defines the number of threads and number of blocks were assigned and launched on the gpu side.

__host__ void calLaunchThread(int &num_block,int &num_thread, int num_objects){
	num_block = (num_objects/NUM_OF_THREAD)+1;
	num_thread = (num_objects < NUM_OF_THREAD)?num_objects:NUM_OF_THREAD;
}