[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Date Index][Thread Index][Author Index]

Re: Realistic drum programming/recording for songs/Human perception

There's a tricky balance on response curves and where to put precision.

Using an imaging example (because I happen to know more about that):

If one does computations with images, those computations often (though not 
always) work better when using a linear measure of photon energy — i.e., 
twice as much photon energy is represented by twice as big a number. 
Exposure adjustments are a prime example of where this is the case.

On the other hand, people's perception of light is decidedly non-linear. 
Twice as much photon energy does not necessarily look twice as bright. 
This makes linear encodings perceptually inefficient because for a steady 
linear progression there will be a lot less information in the shadows 
than people perceive and conversely there will be a lot more precision in 
the highlights.

I may muff the exact numbers here, but what people perceive as middle gray 
is something like 18% gray on a linear scale with black at 0% and white at 
100%. So, if we just stored values ranged 0-100, half of the range would 
only get 18 values while the other half would get 82.

Imaging professionals bash on JPEG for being an 8-bit format — only 256 
distinct levels — but those 256 levels are distributed in a way that is 
more perceptually uniform so often it is more than enough. (Where it does 
get challenged is when editing. Almost every image correction tends to 
lose levels of data when things get remapped — open up the shadows and you 
lose levels in the highlights. So, 256 well distributed levels is good 
representationally but a bit challenged when it comes to large tonal 

Turning back to audio. If MIDI velocities are encoded on a linear scale 
that may make sense in some ways, but since we perceive loudness on a 
logarithmic scale, it probably puts the precision in the wrong places. Or 
considering the middle gray case, does a velocity value of 64 sound half 
as loud as 127? Does it do so in an interesting way? Is the precision 
where you want it when programming a drum loop?