Wednesday, March 31, 2010

Problems with

I would recommend that you do ARToolKit development on Linux. There are many hassles one can face while compiling and trying to build ARToolKit for Windows. I couldn't find a V4L driver for my cheap camera and hence I was stuck with no option but to develop on Windows.

For those of you who are stuck with Windows like me, I suggest you use MS VC++ 2005 or later. VC++ 6.0 has some known bugs with class templates. Once you build and try to run your first AR application you are most likely to get an error like:
"Could not find DsRenderer runtime" or sorts.
ARToolKit depends on DSVideoLib 0.0.8 or higher for video related functions that internally call the VFW or V4L driver. To solve this problem you have to register the as a service. Just open command prompt as administrator, navigate to


then type Regsvr32 "path\"

where path is the absolute path to where you contain the file If you cant find it anywhere, then you are most likely to find it in the "bin" folder of ARToolKit 2.65 version.

Once the service is successfully registered you can run your application smoothly.

Discovering ARToolKit

During my initial days at IIIT-H, I had no definite goals apart from the fact that I wanted to spend my time here productively. Though I had some ideas in mind, my guide suggested me to put them away for later as 6 months was too less a time to complete them. He gave me a week off to think of some good things that I could do.

I wanted to work on CUDA. CUDA (Compute Unified Device Architeture) is a revolutionary concept from nVidia. The principle of the CUDA architecture is fairly simple. Instead of devoting almost half the transistors in the processor IC to cache and registers, CUDA devices ( all the latest nVidia GPUs and nVidia Tesla supercomputer) have most transistors devoted to computing. This means lesser cache (which is made unnecessary in GPUs due to the copious amount of equally fast video memory that they come with these days) and more raw computing power. In a nutshell CUDA devices are essentially an array of multiprocessors with shared device memory and they can outperform any processor in the market today by huge margins! Anyway, I was in IIIT -H to do some constructive research on CUDA and my guide seemed to think that this will take more time than I have.

During the course of my week off , I spent most of the time learning about programmable shaders and writing a few myself. I even learnt how to program CUDA devices. I also wrote a few image processing routines that harnessed CUDA's multiprocessing power ( I have a nVidia Geforce 8600 GTS , a CUDA device with a CUDA computability of 1.2). All this gave me no new ideas other than the ones I came here with.

Towards the end of the week I received an email from my guide. In it he described this concept called Augmented Reality and sent me a few links to a toolkit called ARToolKit developed by Dr. Hirokazu Kato and supported by the Human Interface Technology Laboratory at the University of Washington. As I browsed through the sites and googled a few terms, I found myself increasingly fascinated by this new technology. Though it looked complex and enthralling, looking through the sources and reading a few books on computer vision ( Multiple View Geometry in Computer Vision - R.Hartley and A.Zisserman ) I realized that the principle involved was quite simple.

Those of you familiar with the fundamental graphics pipeline will know that at every stage the vertices are changed from one coordinate system (CS) to another by simply multiplying with an appropriate transformation matrix. Similarly, ARToolKit finds the transformation matrix that maps the coordinates in the real world CS to the coordinates in the camera projection plane CS, where the image is formed. Once this matrix is obtained any given virtual object can be placed anywhere in the scene and its corresponding camera plane coordinates can be obtained by multiplying its coordinates with that matrix. This matrix is called the camera matrix or in ARToolKit terms a Transformation Matrix and it is a 3 x 4 matrix.

ARToolKit uses markers of known dimensions and shapes in order to achieve this. The program first looks for a square box with a darkened border in the scene first. Once it finds it, It matches the pattern inside the box to the pattern templates that it has. This can be quite trivially achieved with common image processing routines. once this marker of known dimensions is detected, it is easy to obtain the camera matrix with the pattern of known dimensions and the camera's focal length.

A Typical ARToolKit style marker with a dark bounding box and a pattern inside.

Though it sounds simple, It is difficult to perfect this technique. most cameras have glitches in their output called distortions. These distortions occur due to improper manufacturing of the lens. Radial distortion if the lens bulges too much and barrel distortion etc. It is impossible to manufacture a perfect lens. However these distortions can be corrected by certain complex methods that I wont get my hands dirty with because thankfully ARToolKit does all that hard work.
Once the camera matrix is known all the hard work is done as far as the augmented reality part is concerned. All that is left for us to do is to place our objects where ever we want them and use this camera matrix to transform it to the camera plane CS.
The code that follows is an example given with the ARToolKit. It draws a cube on the detected pattern.

#ifdef _WIN32
#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
#ifndef __APPLE__
#include <gl/gl.h>
#include <gl/glut.h>
#include <opengl/gl.h>
#include <glut/glut.h>
#include <ar/gsub.h>
#include <ar/video.h>
#include <ar/param.h>
#include <ar/ar.h>

// Camera configuration.
#ifdef _WIN32
char *vconf = "Data\\WDM_camera_flipV.xml";
char *vconf = "";

int xsize, ysize;
int thresh = 100;
int count = 0;

char *cparam_name = "Data/camera_para.dat";
ARParam cparam;

char *patt_name = "Data/patt.hiro";
int patt_id;
double patt_width = 80.0;
double patt_center[2] = {0.0, 0.0};
double patt_trans[3][4];

static void init(void);
static void cleanup(void);
static void keyEvent( unsigned char key, int x, int y);
static void mainLoop(void);
static void draw( void );

int main(int argc, char **argv)
glutInit(&argc, argv);

argMainLoop( NULL, keyEvent, mainLoop );
return (0);

static void keyEvent( unsigned char key, int x, int y)
/* quit if the ESC key is pressed */
if( key == 0x1b ) {
printf("*** %f (frame/sec)\n", (double)count/arUtilTimer());

/* main loop */
static void mainLoop(void)
ARUint8 *dataPtr;
ARMarkerInfo *marker_info;
int marker_num;
int j, k;

/* grab a vide frame */
if( (dataPtr = (ARUint8 *)arVideoGetImage()) == NULL ) {
if( count == 0 ) arUtilTimerReset();

argDispImage( dataPtr, 0,0 );

/* detect the markers in the video frame */
if( arDetectMarker(dataPtr, thresh, &marker_info, &marker_num) < 0 ) {


/* check for object visibility */
k = -1;
for( j = 0; j < marker_num; j++ ) {
if( patt_id == marker_info[j].id ) {
if( k == -1 ) k = j;
else if( marker_info[k].cf < marker_info[j].cf ) k = j;
if( k == -1 ) {

/* get the transformation between the marker and the real camera */
arGetTransMat(&marker_info[k], patt_center, patt_width, patt_trans);



static void init( void )
ARParam wparam;

/* open the video path */
if( arVideoOpen( vconf ) < 0 ) exit(0);
/* find the size of the window */
if( arVideoInqSize(&xsize, &ysize) < 0 ) exit(0);
printf("Image size (x,y) = (%d,%d)\n", xsize, ysize);

/* set the initial camera parameters */
if( arParamLoad(cparam_name, 1, &wparam) < 0 ) {
printf("Camera parameter load error !!\n");
arParamChangeSize( &wparam, xsize, ysize, &cparam );
arInitCparam( &cparam );
printf("*** Camera Parameter ***\n");
arParamDisp( &cparam );

if( (patt_id=arLoadPatt(patt_name)) < 0 ) {
printf("pattern load error !!\n");

/* open the graphics window */
argInit( &cparam, 1.0, 0, 0, 0, 0 );

/* cleanup function called when program exits */
static void cleanup(void)

static void draw( void )
double gl_para[16];
GLfloat mat_ambient[] = {0.0, 0.0, 1.0, 1.0};
GLfloat mat_flash[] = {0.0, 0.0, 1.0, 1.0};
GLfloat mat_flash_shiny[] = {50.0};
GLfloat light_position[] = {100.0,-200.0,200.0,0.0};
GLfloat ambi[] = {0.1, 0.1, 0.1, 0.1};
GLfloat lightZeroColor[] = {0.9, 0.9, 0.9, 0.1};

argDraw3dCamera( 0, 0 );
glClearDepth( 1.0 );

/* load the camera transformation matrix */
argConvGlpara(patt_trans, gl_para);
glLoadMatrixd( gl_para );

glLightfv(GL_LIGHT0, GL_POSITION, light_position);
glLightfv(GL_LIGHT0, GL_AMBIENT, ambi);
glLightfv(GL_LIGHT0, GL_DIFFUSE, lightZeroColor);
glMaterialfv(GL_FRONT, GL_SPECULAR, mat_flash);
glMaterialfv(GL_FRONT, GL_SHININESS, mat_flash_shiny);
glMaterialfv(GL_FRONT, GL_AMBIENT, mat_ambient);
glTranslatef( 0.0, 0.0, 25.0 );
glDisable( GL_LIGHTING );

glDisable( GL_DEPTH_TEST );

If you want to play around with this you should first print out all the markers provided with the toolkit.

The output of this program can be seen once you display your "Hiro" marker in front of the camera. A blue cube appears on the pattern and it turns in accordance to your movements.

Most of the code is self explanatory and the rest will be clear when you write a few programs yourself so I wont waste time in elaborating it.

So this was my first encounter with ARToolKit.

The next day I met my guide ( after trying out ARToolKit ) and discussed with him. We decided that I should develop a demo in which humans can interactively control these objects. Now as to what the demo should be never surfaced until a little later but until then i was tinkering with ARToolKit. In between I tried to recreate the functionality of ARToolKit using OpenCV which I left midway ( I hope to resume it soon ) as soon as I had a solid idea as to what I was going to about the demo.

Augmented Reality

I am currently in IIIT-Hyderabad immersed in my current project on augmented reality.I was introduced to it by a professor here a few months ago. For those of you who haven't heard of augmented reality, it is an upcoming field that will soon play a huge part in revolutionizing human-computer interaction. Augmented-Reality, as the term suggests, means to add virtual elements to a real environment. In other words it can be described as a merger of reality and synthesized virtual realms. The results are truly a feast for the eyes! For instance in the picture show here, the two people who are viewing through their head mounted displays, fed by head mounted cameras can see the virtual city placed on a real table! The city is rendered by the program and placed in the correct position and in the right orientation as per certain calculations which I will describe in detail further ahead.

Augmented reality can be seen as a progeny of existing computer vision concepts. The basic principle is simple:
  • Firstly it is assumed that the camera sees what the user sees, which is almost true given its close proximity to the user's eye and after methodically eliminating radial and axial distortions.
  • Since we wish to place our virtual objects in the scene and make it look convincingly real, we need to know the mapping from the scene to the image plane of the camera. To make it even simpler - The virtual objects need to be aligned with the perspective of the scene. For example, a user looking at a real teapot and a virtual teapot, placed at the same distance from the camera should be convinced that the scene is real. The virtual teapot should not look "out of place" in the scene. There are several novel methods that can be used to achieve this which I will go through in much more detail as I describe my project ahead.
  • The scene should be rendered efficiently. Since the frame rate of the video should be maintained at a reasonable and constant value, we need to finish our frame by frame processing quickly and efficiently.
Now my project here in IIIT - Hyderabad is to find ways to add human interaction into this. So I set out to make a demo, a game rather. My game is quite simple. There is a plane that a user will be holding in front of a camera (like a sheet of paper) . A virtual maze will be created on this plane along with a virtual ball. The objective of this game is that the user should guide the ball through the maze by tilting the plane in his hand. Remember that the ball should be under the influence of gravity to make this possible! :)

It has been almost 2 months since I have been brooding on this project and now I have a working prototype. I now have a 4 sided boundary region in which the user can play with his ball, collision, gravity and a fully functional force accumulator model included. I regret not blogging my progress all this while but now I have some catching up to do. I will devote any free time I have to enumerate the progress that I made over the past 2 months and other proceedings in my project. I will soon upload my game so that you can play it too!

Back to Blogging!

I have been off blogging for long and I now wish to make up for the lost time.
Blogging is a healthy habit. Its enables one to maintain a log his thoughts and actions. Once you look back after years of blogging, you can see your life since then as a smooth slide show. This thought excites me. I often have lots to share and I feel a surge of enthusiasm to log my work here. But on each and every occasion such as this my laziness has successfully been able to take over me. Now I am determined to get over that phase of mine! It helps when I think of my blog as a means of torturing people with my rantings ;) ( Just kidding..hehe). Anyway, I am looking forward to some productive blog activity and urge you people to motivate me by posting your queries and arguments.