Efficient Soft-Edged Shadows Using Percentage Closer Filtering and Pixel Shader Branching.

[download the project executable]

Project Description:

Shadow map techniques are employed in computer graphics to improve visual realism. The shadow map algorithm requires a two pass render of a scene – once from the light’s perspective and a second from the camera’s perspective. The render from the light’s perspective records the depth values from the light source to each object in the scene. The depth values are written to a pbuffer, then stored in a texture where it can later be sent to the fragment shader as a uniform sampler. When rendering the scene from the camara’s view, each screen pixel is transformed to light coordinates where its z-value is compared to the value stored in the shadow map. This compare operation is performed in the fragment shader using the shadow2DProj function. If the pixel’s z-value is higher than the depth value of the shadow map then that pixel is in shadow. If the pixel’s z-value is lower than the depth value stored in the shadow map then the pixel is not in shadow.

One inherent side effect of shadow maps is a hard, aliased edge around the shadow. Building soft edged shadows using percentage closer filtering and pixel shader branching is an visual enhancement to the shadow map that employs optimization mechanisms for calculating a soft penumbra region around a shadow where we previously had a hard, aliased edge. Below are two example scenes that illustrate the visual enhancement of soft edged shadows.

Tparty Scene with Aliased Edge.

Tparty Scene with Soft Edge.

Abstract Shapes Scene with Aliased Edge.

Abstract Shapes Scene with Soft Edge

To create the soft edge, we employ William Reeves algorithm called percentage closer filtering. With this technique we take a large number of samples, 32 to 64, on a region of the depth map that surround a given screen pixel. Based on the results of sampling, we determine an amount, or percent, which is used to attenuate the intensity of the shadow. By performing this calculation on every pixel in the scene the appearance of a soft edge is created on shadow’s edge as we move from 100% in shadow to 0% in shadow. To illustrate this algorithm:

pixel z-value depth map region values binary value where
1 = in shadow and
0 = not in shadow
percentage in shadow
35
40 40 40
30 40 40
30 30 40
 0   0   0 
 1   0   0 
 1   1   0 
33

Given a pixel with a z-value of 35 and a 3x3 depth map region where six depth measurement are 40 and three depth measurements are 30. The locations where the shadow depth is 40, our pixel is closer, so those locations are not in shadow. However, the three depth measurements of 30 are closer than our pixel, so these three locations are in shadow. With three locations in shadow and six not in shadow, the total shaded contribution is 3/9 = 33%.

Pseudocode:
 
Let numberInShadow = 0
Let numberOfSamples = 9
 
For each value in depth map region
Do depth comparison with pixel z-value
If pixel z-value < depth map value
Then numberInShadow += 0
Else if pixel z-value > depth map value
Then numberInShadow += 1
Let percentInShadow = numberInShadow / numberOfSamples


SoftEdgedShadow.frag:
 
uniform sampler2DShadow texture0; // shadowMap
uniform sampler3D texture2; // jitterMap
int totalSamples = 64;
vec4 shadowMapCoord = gl_TexCoord[0];
float filterSize = .48;
vec2 scaleJitterxy = vec2(.5, .5);
vec3 jitterCoord = vec3(gl_FragCoord.xy * scaleJitterxy, 0.0);
 
float shadow = 0.0;
for (int i = 0; i < totalSamples/2; i++)
{
vec4 offset = (texture3D(texture2, jitterCoord) * 2.0) - 1.0;
jitterCoord.z += 1.0/32.0;
shadowMapCoord.xy = offset.xy * filterSize + gl_TexCoord[0].xy;
shadow += shadow2DProj(texture0, shadowMapCoord);
shadowMapCoord.xy = offset.zw * filterSize + gl_TexCoord[0].xy;
shadow += shadow2DProj(texture0, shadowMapCoord);
}
shadow /= totalSamples; // Percentage in shadow

According to Yury Uralsky’s contribution in the book GPU Gems 2, sampling a depth map this way produces a banding effect at the shadow edges. This is an undesirable effect and can be corrected by offsetting the sample location with controlled randomness, or jittering. The idea is that we divid the sample region into sub-regions and randomly sample one location within each sub-region. The depth value recorded from with this new sub-region is now compared to the pixel z-value. Employing jittering results in a smooth, soft penumbra that resembles noise rather than banding.

Regular Sample Box Jittered Sample Box
 o   o   o 
 o   o   o 
 o   o   o 
        
        
 o       
 o       
        
        
        
 o       
        
    o    
        
        
        
        
 o       
        
        
       o 
    o    
        
        
        
       o 
        
        
        
 o       

Pseudocode:
 
// Jitter the sample box
box[0] += ((float)(ran1()*2)-1) * (0.5f/uSamp);
box[1] += ((float)(ran1()*2)-1) * (0.5f/vSamp);
box[2] += ((float)(ran1()*2)-1) * (0.5f/uSamp);
box[3] += ((float)(ran1()*2)-1) * (0.5f/vSamp);

Implementing this algorithm would require 32 to 64 offset calculations for each set of z-value comparisons for a single pixel. To preserve CPU computation cycles, offsets are precalculated and stored in a 3D texture map.

glTexImage3D(GL_TEXTURE_3D, 0, GL_RGBA, size, size, uSamp * vSamp / 2, 0, GL_RGBA, GL_BYTE, img);

The tradeoff here is that we save on computation time, but we consume more memory. However, we decrease memory requirements for storing this 3D jitter map by storing a pair of offsets in each texel. Offsets occur in the x and y direction, so we store one set of x and y offset values in the r and g portion of the pixel data structure, and a second set of x and y offset values in the b and a portion of the pixel data structure. Storing offset data in this way reduces necessary texture map size in half.

Performing 32 to 64 samples for each screen pixel still requires an extensive amount of processing in the fragment shader. We use the services of another optimization technique called pixel shader branching to improve the efficiency of the percentage closer filtering algorithm. The key idea is that most pixels in the scene are either entirely in shadow or entirely out of shadow, only a relatively small number of pixels actually reside in the penumbra region of the shadow. So, rather than taking 32 to 64 samples blindly, we first perform a small number of samples, 8 to 16. If all 8 or 16 off these depth test samples return 100% in shadow or 100% out of shadow (all 0’s or all 1’s), then we assume that the pixel does not reside in the penumbra region of the shadow, so we do not perform all 32 or 64 samples. On the other hand, if some of these 8 or 16 depth test samples return in shadow and some return not in shadow (some 0’s and some 1’s), then we assume that the pixel does lie in the penumbra so we continue performing the entire 32 to 64 samples. This algorithm has been implemented in the project code fragment program in a very similar manner as shown in the book GPU Gems 2, paage 279. The Pseudocode code for this algorithm is as follows:

Do 4 to 8 samples
Lookup offset
Get shadow map coordinates with offset.xy values
Do depth comparison and store result
Get shadow map coordinates with offset.zw values
Do depth comparison and store result
If stored result not equal to 0 or 8 then
Do 32 to 64 samples
Lookup offset
Get shadow map coordinates with offset.xy values
Do depth comparison and store result
Get shadow map coordinates with offset.zw values
Do depth comparison and store result
Else
Not in the penumbra, no further samples required/td>
Add shadow percentage to final fragment color


SoftEdgedShadowPSB.frag:
int totalSamples = 64;
int testSamples = 8;
vec4 shadowMapCoord = gl_TexCoord[0];
float filterSize = .48;
vec2 scaleJitterxy = vec2(.5, .5);
vec3 jitterCoord = vec3(gl_FragCoord.xy * scaleJitterxy, 0.0);
float shadow = 0.0;
for(int i = 0; i < testSamples/2; i++)
{
vec4 offset = (2.0 * texture3D(texture2, jitterCoord)) - 1.0;
jitterCoord.z += 1.0/32.0;
shadowMapCoord.xy = offset.xy * filterSize + gl_TexCoord[0];
shadow += shadow2DProj(texture0, shadowMapCoord);
shadowMapCoord.xy = offset.zw * filterSize + gl_TexCoord[0];
shadow += shadow2DProj(texture0, shadowMapCoord);
}
shadow /= testSamples;
if((shadow - 1) * shadow != 0)
{
for(int i = 0; i < totalSamples/2; i++)
{
vec4 offset = (2.0 * texture3D(texture2, jitterCoord)) - 1.0;
jitterCoord.z += 1.0/32.0;
shadowMapCoord.xy = offset.xy * filterSize + gl_TexCoord[0];
shadow += shadow2DProj(texture0, shadowMapCoord);
shadowMapCoord.xy = offset.zw * filterSize + gl_TexCoord[0];
shadow += shadow2DProj(texture0, shadowMapCoord);
}
shadow /= (totalSamples-testSamples);
}

Performace of pixel shader branching seems dependent on the complexity of the geometry and the amount of penumbra within a given scene. Rendering a scene once with pixel shader branching implemented and without pixel shader branching yielded the following results:

Draw Times:

PSB Not Implemented: 0.098
PSB Implemented: 0.065
Draw Times:

PSB Not Implemented: 0.184
PSB Implemented: 0.061

[top of page]