English/EspaƱol
This pages is about the shader optimization stage in nouveau, what can be done for it, and how.
- instruction combining (scalar+vector)
- support for things like MAD
- postscaling support
- conditionals to avoid expensive texture access
TEX t0, blah TEX t1, blah2 MUL t0, t0, t1
- turns into
TEXC t0, blah TEX (NE.xyzw) t1, blah2 MUL t0, t0, t1
loop unrolling (nvidia unrolls small loops (<256 instructions executed iirc))
- try to see if we can make use of the unknown upcode 1 for vtxprogs for cool stuff (it seems to do a lot of stuff)
- try to detect things that vary linearly in the fragment program, and push them in the vertex program
- merge code like this :
ADD vec1.x, vec2.x ADD vec3.x, vec4.x
- into :
ADD vec1.xy, vec2.xy
- after putting vec3.x in vec1.y, and vec4.x in vec2.y
- NV30/NV40 fragment program outputs overlap temporary registers. The details aren't clear yet, but "temp" 0 is result.color[0]. We also have the ability to write to result.color[1-3], and result.depth (not sure which "temp" overlaps yet). These temps can (and should?) be used up until the point that the result reg is written.

