Jaap Stolk
2010 Update
- The hardware below (including the NV15) card stopped working.
The links to http://jwstolk.xs4all.nl/ are also down, but I can e-mail Octave plots used to find the spotlight parameters.
- I included the C-code for calculating the 7 hardware registers from the GL parameters below. Measurements produced a strange function, and showed many non-linearities, which could very well be the result of using a look-up-table and using linear interpolation on that table. We worked out the relations between the parameters, so only two tables where needed to calculate the 7 hardware registers. The table values where trained using a tail-and-error tool, until they fitted the measurements very well. (Note that some global variable could affect the needed circulations, we only tried (many) combinations of spotlight parameters.)
Hardware
- AMD Athlon 1200MHz, 640MB, Gentoo, kernel 2.6.17.6
nv15 GeForce2 GTS/Pro rev a3
lspci:
01:00.0 VGA compatible controller: nVidia Corporation NV15 [!GeForce2 GTS/Pro] (rev a3) (prog-if 00 [VGA])
Subsystem: ASUSTeK Computer Inc. Unknown device 400e
Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 10
Memory at dc000000 (32-bit, non-prefetchable) [size=16M]
Memory at d0000000 (32-bit, prefetchable) [size=128M]
[virtual] Expansion ROM at dd000000 [disabled] [size=64K]
Capabilities: [60] Power Management version 1
Capabilities: [44] AGP version 2.0
Links
my Nouveau related stuff: http://jwstolk.xs4all.nl/nouveau/
spotlight parameters
plots used for decoding the spotlight parameters: http://jwstolk.xs4all.nl/nouveau/plots.htm
latest spotlight code: http://jwstolk.xs4all.nl/nouveau/code/spotlight_parameters.c
other code (used to test and adjust the look-Up-Tables) code: http://jwstolk.xs4all.nl/nouveau/code/
mmio-trace file format converter:
mmio-trace macro search program:
- many ideas, not a lot of code.
Testing in-kernel-tree mmio-trace module with kernel 2.6.24 and Nouveau:
- (Note: Tracing nouveau is only academic, we could just look at the source. Maybe useful to compare with a real trace.)
- Get 2.6.24 kernel (Gentoo: git-sources)
- Apply pq's page fault hook patch and mmio-trace module patch to the kernel.
- Enable DebugFS (CONFIG_DEBUG_FS), RelayFS (CONFIG_RELAY), CONFIG_PAGE_FAULT_HANDLERS and CONFIG_MMIOTRACE.
- Compile and install kernel.
Follow InstallNouveau but don't insmod anything and don't startx. (use the ./configure --prefix=/usr/ hint for drm on Gentoo)
- Get latest mmio-trace: git clone git://people.freedesktop.org/~pq/mmio-trace
- Build mmio-trace: cd mmio-trace; make userspace (we use the in-kernel-tree module instead)
- Make a hooked copy of the drm and nouveau module:
./hook-module <...>/drm/linux-core/drm.ko drm-hook.ko
./hook-module <...>/drm/linux-core/nouveau.ko nouveau-hook.ko
- Mount debugfs to /debug: mkdir -p /debug; mount -t debugfs debugfs /debug
- Load the mmio module: insmod /lib/modules/2.6.24-rc8-git7/kernel/arch/x86/kernel/mmiotrace/mmiotrace.ko
- Start ./mmio-trace, it will create file cpu0 on the current directory.
- Load the hooked nvidia module: insmod drm-hook.ko insmod nouveau-hook.ko
- Start Xorg, do some simple things for a few seconds, and quit Xorg. Now you have a trace, and all that is left is to shut down tracing.
- Unload the hooked modules: rmmod nouveau; rmmod drm
- Kill the mmio-trace process. Ctrl-c, for instance.
- Unload mmio module: rmmod mmio
- Get the lxml package (if needed)
Get the rules-ng register name database: cvs -z3 -d:pserver:anonymous@nouveau.cvs.sourceforge.net:/cvsroot/nouveau co -P rules-ng
- (fix: chmod +x rules-ng/parsers/*.py)
Compile the staic databse as a lib: cd rules-ng/staticdb; make; cp libnvidia-mmio.so <...>/mmio-trace
./mmio-parse -m NN -s ./libnvidia-mmio.so < cpu0 > cpu0_parsed (NN = 15 for a NV15 card, etc.)
irc: freenode, #nouveau and #mmio-trace, nick: jwstolk
mail: jwstolk_at_gmail.
// spotlight_parameters.c
//
// Copyright (C) 2007 J.W. Stolk, jwstolk@gmail.com, GPL v2
//
// -----------------------------------------------------------------------
// Description: spotlight functions for the Nouveau 3D driver
// see: http://nouveau.cvs.sourceforge.net/nouveau/doc/nv20_light
// http://jwstolk.xs4all.nl/nouveau/?M=A
// http://jwstolk.xs4all.nl/nouveau/plots.htm
// http://jwstolk.xs4all.nl/nouveau/plots_old.htm
// http://jwstolk.xs4all.nl/nouveau/code/
// -----------------------------------------------------------------------
// compile: gcc spotlight_parameters.c -lm -Wall -ospotlight_parameters
// -----------------------------------------------------------------------
// version: Sun Feb 4 21:24:50 CET 2007
#include <math.h>
#include <stdio.h>
// ,----------------------------------------------------------------------,
// | Input: GL parameters: exponent, cutoff, dirX, dirY, dirZ |
// | Output: hardware registers: hw_reg_a, hw_reg_b, hw_reg_c, |
// | hw_reg_d, hw_reg_e, hw_reg_f, hw_reg_g |
// '----------------------------------------------------------------------'
void get_spotlight_parameters(double exponent, double cutoff, double dirX, double dirY, double dirZ,
double* hw_reg_a, double* hw_reg_b, double* hw_reg_c,
double* hw_reg_d, double* hw_reg_e, double* hw_reg_f, double* hw_reg_g){
// The hardware registers are 32-bit floating point, so using double is more than sufficent.
double cutoff_rad;
double lut_index; // (renamed from step_nr)
int lut_index_int;
double lut_value;
double left_factor;
double right_factor;
double spotlight_length;
double tmp;
// Look-Up-Tables:
const double spotlight_LUT[51]={
-0.00004313671710,-0.00001664488452,-0.00001692090548,-0.00001692090548,-0.00001692090548,
-0.00001692090548,-0.00001692090547,-0.00475650768720,-0.00949630692880,-0.01423604510371,
-0.01897580409635,-0.02371554958185,-0.02995794597311,-0.03521003198499,-0.03967819537527,
-0.04331929838488,-0.04649309566914,-0.04914397523183,-0.05136181729088,-0.05328106166245,
-0.05492743778197,-0.05612668233260,-0.05696293343000,-0.05703870874157,-0.05605074400074,
-0.05367270272221,-0.04924597960194,-0.04221648841730,-0.03196529891062,-0.01801446831990,
-0.00001176230050, 0.03771627483479, 0.09906659099714, 0.17749723974919, 0.26660459589762,
0.36026175679746, 0.45244459111994, 0.53970871355655, 0.61833349227586, 0.68679650633229,
0.74600077835005, 0.79561688860929, 0.83640888233911, 0.87019888840919, 0.89667101785942,
0.91862913345432, 0.93576909519174, 0.94930296515212, 0.96040906843445, 0.96858874831011,
0.97572753616790};
const double exponent_LUT[51] ={
0.99999434105398, 0.99999302624596, 0.99999606609345, 0.99999606609344, 0.99999606609345,
0.99999606609344, 0.99999606609345, 0.99961048683387, 0.99922497235411, 0.99883938661131,
0.99845380095804, 0.99806823281838, 0.99708892684478, 0.99581822547003, 0.99414524850779,
0.99195363947614, 0.98902742170359, 0.98511916671572, 0.97986437860412, 0.97270389416853,
0.96285086481830, 0.94925040677666, 0.93017427428989, 0.90341705531002, 0.86563085422000,
0.81217929697770, 0.73670241935000, 0.63063384884101, 0.48277542009796, 0.27857488415982,
-0.00003948562476,-0.31875989523469,-0.64839014406671,-0.98119795202123,-1.30545553528001,
-1.60888090900001,-1.88661310700404,-2.12615570648801,-2.33539034461990,-2.52149001443213,
-2.66658392901470,-2.78644657501522,-2.89098264098132,-2.96975242355677,-3.04852169688304,
-3.08190421360393,-3.13682821311390,-3.19654141784206,-3.24449362878119,-3.29244580044634,
-3.35922785872215};
const double exp_conv_factor = 0.2482711787;
// calculate the index for the Look-Up-Table:
// note: inverse of: exponent = exp((lut_index-30)*exp_conv_factor);
lut_index = log(exponent)/exp_conv_factor + 30;
lut_index_int = floor(lut_index);
right_factor = (double) lut_index - lut_index_int;
left_factor = 1.0 - right_factor;
// clamp exponent to range of Look-Up_Table. valid exponent range: 0.0005825 ... 143.369264
if( lut_index_int < 0 ){
lut_index = 0;
lut_index_int= 0;
right_factor = 0;
left_factor = 1;
}else if( lut_index_int > 49 ){
// (should not be possible. max exponent=128)
lut_index = 50;
lut_index_int= 49;
right_factor = 1;
left_factor = 0;
}
// hw_reg_b is a direct lookup based on exponent, it's not affected by other parameters:
*hw_reg_b = exponent_LUT[lut_index_int]*left_factor + exponent_LUT[lut_index_int+1]*right_factor;
// lookup the initial hw_reg_a and hw_reg_g, based on exponent:
lut_value = spotlight_LUT[lut_index_int]*left_factor + spotlight_LUT[lut_index_int+1]*right_factor;
// adjust for cutoff:
cutoff_rad = cutoff*M_PI/180.0;
*hw_reg_a = (cos(cutoff_rad)-lut_value) / (1-cos(cutoff_rad));
// all hw_reg's (except hw_reg_b) change at the hw_reg_a == 0 point:
if( *hw_reg_a<0.0 ){
*hw_reg_a = 0;
}else{
// if hw_reg_a > 0, then hw_reg_g should be calculated for the exponent at hw_reg_a==0
// solve: w_reg_a == (cosl(cutoff_rad)-lut_value) / (1-cosl(cutoff_rad)) == 0
lut_value = cos(cutoff_rad);
// i = 23; this is the lowest point in the LUT. start search from here;
// while( spotlight_LUT[i] < lut_value_a0 && i<50 ) i++;
// reverse-interpolate:
// lut_index_a0 = i-1 + (lut_value_a0-spotlight_LUT[i-1]) / (spotlight_LUT[i-1] - spotlight_LUT[i]);
// exponent_a0 = exp( (lut_index-30)*exp_conv_factor );
// but we don't need the exponent, because hw_reg_g uses the same look-up-table
// so lut_value can directly be used to calculate hw_reg_g
}
// hw_reg_c always is: hw_reg_a - hw_reg_b + 1:
*hw_reg_c = *hw_reg_a - *hw_reg_b + 1.0;
// hw_reg_g:
tmp = 1/(lut_value-1);
*hw_reg_g = tmp + 1; // hw_reg_g = 1/(lut_value-1) + 1;
// hw_reg_d, hw_reg_e, hw_reg_f:
spotlight_length = sqrt( dirX*dirX + dirY*dirY + dirZ*dirZ );
tmp /= spotlight_length;
*hw_reg_d = tmp * dirX; // hw_reg_d = 1/(lut_value-1) * dirX/spotlight_length;
*hw_reg_e = tmp * dirY; // hw_reg_e = 1/(lut_value-1) * dirY/spotlight_length;
*hw_reg_f = tmp * dirZ; // hw_reg_f = 1/(lut_value-1) * dirZ/spotlight_length;
}
// ,----------------------------------------------------------------------,
// | Main: |
// '----------------------------------------------------------------------'
int main(int argc, char **argv){
double exponent;
double cutoff;
double dirX;
double dirY;
double dirZ;
double hw_reg_a;
double hw_reg_b;
double hw_reg_c;
double hw_reg_d;
double hw_reg_e;
double hw_reg_f;
double hw_reg_g;
exponent = 1.52;
cutoff = 20;
dirX = 0.50;
dirY = 0.51;
dirZ = 0.52;
get_spotlight_parameters(exponent, cutoff, dirX, dirY, dirZ, &hw_reg_a, &hw_reg_b, &hw_reg_c,
&hw_reg_d, &hw_reg_e, &hw_reg_f, &hw_reg_g);
printf("hw_reg_a = %15.10f\n",hw_reg_a);
printf("hw_reg_b = %15.10f\n",hw_reg_b);
printf("hw_reg_c = %15.10f\n",hw_reg_c);
printf("hw_reg_d = %15.10f\n",hw_reg_d);
printf("hw_reg_e = %15.10f\n",hw_reg_e);
printf("hw_reg_f = %15.10f\n",hw_reg_f);
printf("hw_reg_g = %15.10f\n",hw_reg_g);
return(0);
}
