(c) 2005, Klaus Post v1.0 beta
Download SoxFilter 1.0
Download SoxFilter 1.0 Source
Any number of effects can be entered, and they will be executed in the order they are specified. Effect syntax is just like SOX. See Effect Overview and Filter Reference below.
A simple filter could look like this:
AviSource("movie.avi")
SoxFilter("bandpass 500 100")
convertAudioTo16Bit()
This will keep a 100Hz band around 500Hz. SoxFilter converts audio to 32 bit integers. This allows to keep the additional dynamic range of float point samples, but it requires a convertion to 16 bit audio before output, since most codecs only support 16 bit.
Multiple effects can be stacked like this:
AviSource("movie.avi")
SoxFilter("bandpass 2000 1000", "vol 2.0", "reverb 1.0 600.0 180.0 200.0 220.0 240.0")
ConvertAudioTo16Bit()
Which is a faster version of:
AviSource("movie.avi")
SoxFilter("bandpass 2000 1000")
SoxFilter("vol 2.0")
SoxFilter("reverb 1.0 600.0 180.0 200.0 220.0 240.0")
ConvertAudioTo16Bit()
Effects:
band [ -n ] center [ width ]
bandpass frequency bandwidth
bandreject frequency bandwidth
chorus gain-in gain out delay decay speed depth
-s | -t [ delay decay speed depth -s | -t ]
compand attack1,decay1[,attack2,decay2...]
in-dB1,out-dB1[,in-dB2,out-dB2...]
[ gain [ initial-volume [ delay ] ] ]
dcshift shift [ limitergain ]
deemph
earwax
echo gain-in gain-out delay decay [ delay decay ... ]
echos gain-in gain-out delay decay [ delay decay ... ]
fade [ type ] fade-in-length
[ stop-time [ fade-out-length ] ]
filter [ low ]-[ high ] [ window-len [ beta ]]
flanger gain-in gain-out delay decay speed < -s | -t >
highp frequency
highpass frequency
lowp frequency
lowpass frequency
mask
mcompand "attack1,decay1[,attack2,decay2...]
in-dB1,out-dB1[,in-dB2,out-dB2...]
[ gain [ initial-volume [ delay ] ] ]" xover_freq
noiseprof [profile-file]
noisered profile-file [threshold]
phaser gain-in gain-out delay decay speed < -s | -t >
pitch shift [ width interpole fade ]
reverb gain-out reverb-time delay [ delay ... ]
silence above_periods [ duration threshold[ d | % ]
[ below_periods duration
threshold[ d | % ]]
speed [ -c ] factor
stretch [ factor [ window fade shift fading ]
swap [ 1 2 | 1 2 3 4 ]
synth [ length ] type mix [ freq [ -freq2 ]
[ off ] [ ph ] [ p1 ] [ p2 ] [ p3 ]
vibro speed [ depth ]
vol gain [ type [ limitergain ] ]
band [ -n ] center [ width ]
Apply a band-pass filter. The frequency response drops loga-
rithmically around the center frequency. The width gives the
slope of the drop. The frequencies at center + width and
center - width will be half of their original amplitudes.
Band defaults to a mode oriented to pitched signals, i.e.
voice, singing, or instrumental music. The -n (for noise)
option uses the alternate mode for un-pitched signals. Warn-
ing: -n introduces a power-gain of about 11dB in the filter,
so beware of output clipping. Band introduces noise in the
shape of the filter, i.e. peaking at the center frequency and
settling around it. See filter for a bandpass effect with
steeper shoulders.
bandpass frequency bandwidth
Butterworth bandpass filter. Description coming soon!
bandreject frequency bandwidth
Butterworth bandreject filter. Description coming soon!
chorus gain-in gain-out delay decay speed depth
-s | -t [ delay decay speed depth -s | -t ... ]
which the absolute value of the input signal is integrated to
determine its volume; attacks refer to increases in volume
and decays refer to decreases. Where more than one pair of
attack/decay parameters are specified, each channel is
treated separately and the number of pairs must agree with
the number of input channels. The second parameter is a list
of points on the compander's transfer function specified in
dB relative to the maximum possible signal amplitude. The
input values must be in a strictly increasing order but the
transfer function does not have to be monotonically rising.
The special value -inf may be used to indicate that the input
volume should be associated output volume. The points
-inf,-inf and 0,0 are assumed; the latter may be overridden,
but the former may not.
The third (optional) parameter is a post-processing gain in
dB which is applied after the compression has taken place;
the fourth (optional) parameter is an initial volume to be
assumed for each channel when the effect starts. This per-
mits the user to supply a nominal level initially, so that,
for example, a very large gain is not applied to initial sig-
nal levels before the companding action has begun to operate:
it is quite probable that in such an event, the output would
be severely clipped while the compander gain properly adjusts
itself.
The fifth (optional) parameter is a delay in seconds. The
input signal is analyzed immediately to control the compan-
der, but it is delayed before being fed to the volume
adjuster. Specifying a delay approximately equal to the
attack/decay times allows the compander to effectively oper-
ate in a "predictive" rather than a reactive mode.
dcshift shift [ limitergain ]
DC Shift the audio data, with basic linear amplitude formula.
This is most useful if your audio data tends to not be cen-
tered around a value of 0. Shifting it back will allow you
to get the most volume adjustments without clipping audio
data.
The first option is the dcshift value. It is a floating
point number that indicates the amount to shift.
An option limtergain value can be specified as well. It
should have a value much less then 1.0 and is used only on
peaks to prevent clipping.
deemph Apply a treble attenuation shelving filter to samples in
audio cd format. The frequency response of pre-emphasized
recordings is rectified. The filtering is defined in the
standard document ISO 908.
Add a sequence of echos to a sound sample. Each delay/decay
part gives the delay in milliseconds and the decay (relative
to gain-in) of that echo. Gain-out is the volume of the out-
put.
earwax Makes sound easier to listen to on headphones. Adds audio-
cues to samples in audio cd format so that when listened to
on headphones the stereo image is moved from inside your head
(standard for headphones) to outside and in front of the lis-
tener (standard for speakers). See
www.geocities.com/beinges for a full explanation.
echo gain-in gain-out delay decay [ delay decay ... ]
Add echoing to a sound sample. Each delay/decay part gives
the delay in milliseconds and the decay (relative to gain-in)
of that echo. Gain-out is the volume of the output.
echos gain-in gain-out delay decay [ delay decay ... ]
Add a sequence of echos to a sound sample. Each delay/decay
part gives the delay in milliseconds and the decay (relative
to gain-in) of that echo. Gain-out is the volume of the out-
put.
fade [ type ] fade-in-length
[ stop-time [ fade-out-length ] ]
Add a fade effect to the beginning, end, or both of the audio
data.
For fade-ins, this starts from the first sample and ramps the
volume of the audio from 0 to full volume over fade-in-length
seconds. Specify 0 seconds if no fade-in is wanted.
For fade-outs, the audio data will be truncated at the stop-
time and the volume will be ramped from full volume down to 0
starting at fade-out-length seconds before the stop-time. No
fade-out is performed if these options are not specified.
All times can be specified in either periods of time or sam-
ple counts. To specify time periods use the format
hh:mm:ss.frac format. To specify using sample counts, spec-
ify the number of samples and append the letter 's' to the
sample count (for example 8000s).
An optional type can be specified to change the type of enve-
lope. Choices are q for quarter of a sinewave, h for half a
sinewave, t for linear slope, l for logarithmic, and p for
inverted parabola. The default is a linear slope.
filter [ low ]-[ high ] [ window-len [ beta ] ]
Apply a Sinc-windowed lowpass, highpass, or bandpass filter
of given window length to the signal. low refers to the fre-
quency of the lower 6dB corner of the filter. high refers to
the frequency of the upper 6dB corner of the filter.
A lowpass filter is obtained by leaving low unspecified, or
0. A highpass filter is obtained by leaving high unspeci-
fied, or 0, or greater than or equal to the Nyquist fre-
quency.
The window-len, if unspecified, defaults to 128. Longer win-
dows give a sharper cutoff, smaller windows a more gradual
cutoff.
The beta, if unspecified, defaults to 16. This selects a
Kaiser window. You can select a Nuttall window by specifying
anything <= 2.0 here. For more discussion of beta, look
under the resample effect.
flanger gain-in gain-out delay decay speed < -s | -t >
Add a flanger to a sound sample. Each triple
delay/decay/speed gives the delay in milliseconds and the
decay (relative to gain-in) with a modulation speed in Hz.
The modulation is either sinodial (-s) or triangular (-t).
Gain-out is the volume of the output.
highp frequency
Apply a single pole recursive high-pass filter. The fre-
quency response drops logarithmically with I frequency in the
middle of the drop. The slope of the filter is quite gentle.
See filter for a highpass effect with sharper cutoff.
highpass frequency
Butterworth highpass filter. Description coming soon!
lowp frequency
Apply a single pole recursive low-pass filter. The frequency
response drops logarithmically with frequency in the middle
of the drop. The slope of the filter is quite gentle. See
filter for a lowpass effect with sharper cutoff.
lowpass frequency
Butterworth lowpass filter. Description coming soon!
mask Add "masking noise" to signal. This effect deliberately adds
white noise to a sound in order to mask quantization effects,
created by the process of playing a sound digitally. It
tends to mask buzzing voices, for example. It adds 1/2 bit
of noise to the sound file at the output bit depth.
mcompand "attack1,decay1[,attack2,decay2...]
in-dB1,out-dB1[,in-dB2,out-dB2...]
[gain [initial-volume [delay ] ] ]" xover_freq
Multi-band compander is similar to the single band compander
but the audio file is first divided up into bands and then
the compander is ran on each band. See the compand effect
for definition of its options. Compand options are specified
between double quotes and the crossover frequency for that
band is specefied seperately with xover_fre. This can be
repeated multiple times to create multiple bands.
noiseprof [profile-file]
noisered profile-file [threshold]
Noise reduction filter with profiling. This filter is moder-
ately effective at removing consistent background noise such
as hiss or hum. To use it, first run the noiseprof effect on
a section of silence (that is, a section which contains noth-
ing but noise). The noiseprof effect will print a noise pro-
file to profile-file, or to stdout if no profile-file is
specified. If there is sound output on stdout then the pro-
file will instead be directed to stderr.
To actually remove the noise, run SoX again with the noisered
filter. The filter needs one argument, profile-file, which
contains the noise profile from noiseprof. thershold speci-
fies how much noise should be removed, and may be between 0
and 1 with a default of 0.5. Higher values will remove more
noise but present a greater possibility of distorting the
desired audio signal. Experiment with different threshold
values to find the optimal one for your sample.
phaser gain-in gain-out delay decay speed < -s | -t >
Add a phaser to a sound sample. Each triple
delay/decay/speed gives the delay in milliseconds and the
decay (relative to gain-in) with a modulation speed in Hz.
The modulation is either sinodial (-s) or triangular (-t).
The decay should be less than 0.5 to avoid feedback. Gain-
out is the volume of the output.
pitch shift [ width interpole fade ]
Change the pitch of file without affecting its duration by
cross-fading shifted samples. shift is given in cents. Use a
positive value to shift to treble, negative value to shift to
bass. Default shift is 0. width of window is in ms. Default
width is 20ms. Try 30ms to lower pitch, and 10ms to raise
-w < nut / ham > : select either a Nuttal (~90 dB stopband)
or Hamming (~43 dB stopband) window. Default is nut.
-width long / short / # : specify the (approximate) width of
the filter. long is 1024 samples; short is 128 samples.
Alternatively, an exact number can be used. Default is long.
The short option is not recommended, as it produces poor
quality results.
-cutoff # : specify the filter cutoff frequency in terms of
fraction of frequency bandwidth, also know as the Nyquist
frequency. Please see the resample effect for further infor-
mation on Nyquist frequency. If upsampling, then this is the
fraction of the original signal that should go through. If
downsampling, this is the fraction of the signal left after
downsampling. Default is 0.95. Remember that this is a
float.
reverb gain-out reverbe-time delay [ delay ... ]
Add reverberation to a sound sample. Each delay is given in
milliseconds and its feedback is depending on the reverb-time
in milliseconds. Each delay should be in the range of half
to quarter of reverb-time to get a realistic reverberation.
Gain-out is the volume of the output.
silence above_periods [ duration threshold[ d | % ]
[ below_periods duration threshold[ d | % ]]
Removes silence from the beginning or end of a sound file.
Silence is anything below a specified threshold.
When trimming silence from the beginning of a sound file, you
specify a duration of audio that is above a given silence
threshold before audio data is processed. You can also spec-
ify the count of periods of none silence you want to detect
before processing audio data. Specify a period of 0 if you
do not want to trim data from the front of the sound file.
When optionally trimming silence form the end of a sound
file, you specify the duration of audio that must be below a
given threshold before stopping to process audio data. A
count of periods that occur below the threshold may also be
specified. If this options are not specified then data is
not trimmed from the end of the audio file.
Duration counts may be in the format of time, hh:mm:ss.frac,
or in the exact count of samples.
Threshold may be suffixed with d, or % to indicated the value
is in decibels or a percentage of max value of the sample
value. A value of '0%' will look for total silence.
speed [ -c ] factor
Speed up or down the sound, as a magnetic tape with a speed
control. It affects both pitch and time. A factor of 1.0
means no change, and is the default. 2.0 doubles speed, thus
time length is cut by a half and pitch is one octave higher.
0.5 halves speed thus time length doubles and pitch is one
octave lower. If the optional -c parameter is used then the
factor is specified in "cents".
stretch factor [window fade shift fading]
Time stretch file by a given factor. Change duration without
affecting the pitch. factor of stretching: >1.0 lengthen,
<1.0 shorten duration. window size is in ms. Default is
20ms. The fade option, can be "lin". shift ratio, in [0.0
1.0]. Default depends on stretch factor. 1.0 to shorten, 0.8
to lengthen. The fading ratio, in [0.0 0.5]. The amount of a
fade's default depends on factor and shift.
synth [ length ] type mix [ freq [ -freq2 ]
[ off ] [ ph ] [ p1 ] [ p2 ] [ p3 ]
The synth effect will generate various types of audio data.
Although this effect is used to generate audio data, an input
file must be specified. The length of the input audio file
determines the length of the output audio file.
<length> length in sec or hh:mm:ss.frac, 0=inputlength,
default=0
<type> is sine, square, triangle, sawtooth, trapetz, exp,
whitenoise, pinknoise, brownnoise, default=sine
<mix> is create, mix, amod, default=create
<freq> frequency at beginning in Hz, not used for noise..
<freq2> frequency at end in Hz, not used for noise..
<freq/2> can be given as %%n, where 'n' is the number of half
notes in respect to A (440Hz)
<off> Bias (DC-offset) of signal in percent, default=0
<ph> phase shift 0..100 shift phase 0..2*Pi, not used for
noise..
<p1> square: Ton/Toff, triangle+trapetz: rising slope time
(0..100)
<p2> trapetz: ON time (0..100)
the audio data. The format for specifying sample counts is
the number of samples with the letter 's' appended to it. A
value of 8000s will wait until 8000 samples are read before
starting to process audio data.
vibro speed [ depth ]
Add the world-famous Fender Vibro-Champ sound effect to a
sound sample by using a sine wave as the volume knob. Speed
gives the Hertz value of the wave. This must be under 30.
Depth gives the amount the volume is cut into by the sine
wave, ranging 0.0 to 1.0 and defaulting to 0.5.
vol gain [ type [ limitergain ] ]
The vol effect is much like the command line option -v. It
allows you to adjust the volume of an input file and allows
you to specify the adjustment in relation to amplitude,
power, or dB. If type is not specified then it defaults to
amplitude.
When type is amplitude then a linear change of the amplitude
is performed based on the gain. Therefore, a value of 1.0
will keep the volume the same, 0.0 to < 1.0 will cause the
volume to decrease and values of > 1.0 will cause the volume
to increase. Beware of clipping audio data when the gain is
greater then 1.0. A negative value performs the same adjust-
ment while also changing the phase.
When type is power then a value of 1.0 also means no change
in volume.
When type is dB the amplitude is changed logarithmically.
0.0 is constant while +6 doubles the amplitude.
An optional limitergain value can be specified and should be
a value much less then 1.0 (ie 0.05 or 0.02) and is used only
on peaks to prevent clipping. Not specifying this parameter
will cause no limiter to be used. In verbose mode, this
effect will display the percentage of audio data that needed
to be limited.
Echo
An echo(1,3x,1 builtins) effect can be naturally found in(1,8) the mountains, standing some-
where on a mountain and shouting a single word will result in(1,8) one or
more repetitions of the word (if(3,n) not, turn a bit around and try again,
or climb to the next mountain).
However, the time(1,2,n) difference between shouting and repeating is the
delay (time(1,2,n)), its loudness is the decay. Multiple echos can have dif-
ferent delays and decays.
It is very popular to use echos to play an instrument with itself
together, like some guitar players (Brain May from Queen) or vocalists
are doing. For music samples of more than one instrument, echo(1,3x,1 builtins) can be
used to add a second sample shortly after the original one.
This will sound as if(3,n) you are doubling the number of instruments play-
ing in(1,8) the same sample:
SoxFilter("echo 0.8 0.88 60.0 0.4")
If the delay is very short, then it sound like a (metallic) robot play-
ing music:
SoxFilter("echo 0.8 0.88 6.0 0.4")
Longer delay will sound like an open(2,3,n) air concert in(1,8) the mountains:
SoxFilter("echo 0.8 0.9 1000.0 0.3")
One mountain more, and:
SoxFilter("echo 0.8 0.9 1000.0 0.3 1800.0 0.25")
Echos
Like the echo effect, echos stand for "ECHO in Sequel", that is the
first echos takes the input, the second the input and the first echos,
the third the input and the first and the second echos, ... and so on.
Care should be taken using many echos (see introduction); a single
echos has the same effect as a single echo.
The sample will be bounced twice in(1,8) symmetric echos:
SoxFilter("echos 0.8 0.7 700.0 0.25 700.0 0.3")
The sample will be bounced twice in(1,8) asymmetric echos:
SoxFilter("echos 0.8 0.7 700.0 0.25 900.0 0.3")
The sample will sound as if(3,n) played in(1,8) a garage:
SoxFilter("echos 0.8 0.7 40.0 0.25 63.0 0.3")
Chorus
The chorus effect has its name because it will often be used to make a
single vocal sound like a chorus. But it can be applied to other
instrument samples too.
It works like the echo(1,3x,1 builtins) effect with a short delay, but the delay isn't
constant. The delay is varied using a sinusoidal or triangular modula-
tion. The modulation depth defines the range the modulated delay is
played before or after the delay. Hence the delayed sound will sound
slower or faster, that is the delayed sound tuned around the original
one, like in(1,8) a chorus where some vocals are a bit out of tune.
The typical delay is around 40ms to 60ms, the speed of the modulation
is best near 0.25Hz and the modulation depth around 2ms.
A single delay will make the sample more overloaded:
SoxFilter("chorus 0.7 0.9 55.0 0.4 0.25 2.0 -t")
Two delays of the original samples sound like this:
SoxFilter("chorus 0.6 0.9 50.0 0.4 0.25 2.0 -t 60.0 0.32 0.4 1.3 -s")
A big chorus of the sample is (three additional samples):
SoxFilter("chorus 0.5 0.9 50.0 0.4 0.25 2.0 -t 60.0 0.32 0.4 2.3 -t 40.0 0.3 0.3 1.3 -s")
Flanger
The flanger effect is like the chorus effect, but the delay varies
between 0ms and maximal 5ms. It sound like wind blowing, sometimes
faster or slower including changes of the speed.
The flanger effect is widely used in(1,8) funk and soul music, where the
guitar sound varies frequently slow or a bit faster.
The typical delay is around 3ms to 5ms, the speed of the modulation is
best near 0.5Hz.
Now, let's groove the sample:
SoxFilter("flanger 0.6 0.87 3.0 0.9 0.5 -s")
listen carefully between the difference of sinusoidal and triangular
modulation:
SoxFilter("flanger 0.6 0.87 3.0 0.9 0.5 -t")
If the decay is a bit lower, than the effect sounds more popular:
SoxFilter("flanger 0.8 0.88 3.0 0.4 0.5 -t")
The drunken loudspeaker system:
SoxFilter("flanger 0.9 0.9 4.0 0.23 1.3 -s")
Reverb
The reverb effect is often used in(1,8) audience hall which are to small or
contain too many many visitors which disturb (dampen) the reflection of
sound at the walls. Reverb will make the sound be perceived as if(3,n) it
were in(1,8) a large hall. You can try the reverb effect in(1,8) your bathroom
or garage or sport halls by shouting loud some words. You'll hear the
words reflected from the walls.
The biggest problem in(1,8) using the reverb effect is the correct setting
of the (wall) delays such that the sound is realistic and doesn't sound
like music playing in(1,8) a tin(1,5) can or has overloaded feedback which
destroys any illusion of playing in(1,8) a big hall. To help you obtain
realistic reverb effects, you should decide first how long the reverb
should take place until it is not loud enough to be registered by your
ears. This is be done by varying the reverb time(1,2,n) "t". To simulate
small halls, use 200ms. To simulate large halls, use 1000ms. Clearly,
the walls of such a hall aren't far away, so you should define its set-
ting be given every wall its delay time. However, if(3,n) the wall is to
far away for the reverb time(1,2,n), you won't hear the reverb, so the nearest
wall will be best at "t/4" delay and the farthest at "t/2". You can try
other distances as well, but it won't sound very realistic. The walls
shouldn't stand to close(2,7,n) to each other and not in(1,8) a multiple integer
distance to each other ( so avoid wall like: 200.0 and 202.0, or some-
thing like 100.0 and 200.0 ).
Since audience halls do have a lot of walls, we will start designing
one beginning with one wall:
SoxFilter("reverb 1.0 600.0 180.0")
One wall more:
SoxFilter("reverb 1.0 600.0 180.0 200.0")
Next two walls:
SoxFilter("reverb 1.0 600.0 180.0 200.0 220.0 240.0")
Now, why not a futuristic hall with six walls:
SoxFilter("reverb 1.0 600.0 180.0 200.0 220.0 240.0 280.0 300.0")
If you run out of machine power or memory, then stop as many applica-
tions as possible (every interrupt will consume a lot of CPU time(1,2,n) which
for bigger halls is absolutely necessary).
Phaser
The phaser effect is like the flanger effect, but it uses a reverb
instead of an echo(1,3x,1 builtins) and does phase shifting. You'll hear the difference
in(1,8) the examples comparing both effects (simply change the effect name).
The delay modulation can be sinusoidal or triangular, preferable is the
later for multiple instruments. For single instrument sounds, the sinu-
soidal phaser effect will give a sharper phasing effect. The decay
shouldn't be to close(2,7,n) to 1.0 which will cause dramatic feedback. A
good range is about 0.5 to 0.1 for the decay.
We will take a parameter setting as for the flanger before (gain-out is
lower since feedback can raise(3,n) the output dramatically):
SoxFilter("phaser 0.8 0.74 3.0 0.4 0.5 -t")
The drunken loudspeaker system (now less(1,3) alcohol):
SoxFilter("phaser 0.9 0.85 4.0 0.23 1.3 -s")
A popular sound of the sample is as follows:
SoxFilter("phaser 0.89 0.85 1.0 0.24 2.0 -t")
The sample sounds if(3,n) ten springs are in(1,8) your ears:
SoxFilter("phaser 0.6 0.66 3.0 0.6 2.0 -t")
Compander
The compander effect allows the dynamic range of a signal(2,7) to be com-
pressed or expanded. For most situations, the attack time(1,2,n) (response to
the music getting louder) should be shorter than the decay time(1,2,n) because
our ears are more sensitive to suddenly loud music than to suddenly
soft music.
For example, suppose you are listening to Strauss' "Also Sprach
Zarathustra" in(1,8) a noisy environment such as a car. If you turn up the
volume enough to hear the soft passages over the road noise, the loud
sections will be too loud. You could try this:
SoxFilter("compand 0.3,1 -90,-90,-70,-70,-60,-20,0,0 -5 0 0.2")
The transfer function ("-90,...") says that very soft sounds between
-90 and -70 decibels (-90 is about the limit of 16-bit encoding(3,n)) will
remain unchanged. That keeps the compander from boosting the volume on
"silent" passages such as between movements. However, sounds in(1,8) the
range -60 decibels to 0 decibels (maximum volume) will be boosted so
that the 60-dB dynamic range of the original music will be compressed
3-to-1 into a 20-dB range, which is wide enough to enjoy the music but
narrow enough to get around the road noise. The -5 dB output gain is
needed to avoid clipping (the number is inexact, and was derived by
experimentation). The 0 for the initial volume will work fine for a
clip that starts with a bit of silence, and the delay of 0.2 has the
effect of causing the compander to react a bit more quickly to sudden
volume changes.