Home > freetb4matlab > signal > specgram.m



% error: [S [, f [, t]]] = specgram(x [, n [, Fs [, window [, overlap]]]])


function [S_r, f_r, t_r] = specgram(x, n, Fs, window, overlap)


% error: [S [, f [, t]]] = specgram(x [, n [, Fs [, window [, overlap]]]])
% Generate a spectrogram for the signal. This chops the signal into
% overlapping slices, windows each slice and applies a Fourier
% transform to determine the frequency components at that slice.
% x: vector of samples
% n: size of fourier transform window, or [] for default=256
% Fs: sample rate, or [] for default=2 Hz
% window: shape of the fourier transform window, or [] for default=hanning(n)
%    Note: window length can be specified instead, in which case
%    window=hanning(length)
% overlap: overlap with previous window, or [] for default=length(window)/2
% Return values
%    S is complex output of the FFT, one row per slice
%    f is the frequency indices corresponding to the rows of S.
%    t is the time indices corresponding to the columns of S.
%    If no return value is requested, the spectrogram is displayed instead.
% Example
%    x = chirp([0:0.001:2],0,2,500);  % freq. sweep from 0-500 over 2 sec.
%    Fs=1000;                  % sampled every 0.001 sec so rate is 1 kHz
%    step=ceil(20*Fs/1000);    % one spectral slice every 20 ms
%    window=ceil(100*Fs/1000); % 100 ms data window
%    specgram(x, 2^nextpow2(window), Fs, window, window-step);
%    %% Speech spectrogram
%    [x, Fs] = auload(file_in_loadpath('sample.wav')); % audio file
%    step = fix(5*Fs/1000);     % one spectral slice every 5 ms
%    window = fix(40*Fs/1000);  % 40 ms data window
%    fftn = 2^nextpow2(window); % next highest power of 2
%    [S, f, t] = specgram(x, fftn, Fs, window, window-step);
%    S = abs(S(2:fftn*4000/Fs,:)); % magnitude in range 0<f<=4000 Hz.
%    S = S/max(S(:));           % normalize magnitude so that max is 0 dB.
%    S = max(S, 10^(-40/10));   % clip below -40 dB.
%    S = min(S, 10^(-3/10));    % clip above -3 dB.
%    imagesc(t, f, flipud(log(S)));   % display in log scale
% The choice of window defines the time-frequency resolution.  In
% speech for example, a wide window shows more harmonic detail while a
% narrow window averages over the harmonic detail and shows more
% formant structure. The shape of the window is not so critical so long
% as it goes gradually to zero on the ends.
% Step size (which is window length minus overlap) controls the
% horizontal scale of the spectrogram. Decrease it to stretch, or
% increase it to compress. Increasing step size will reduce time
% resolution, but decreasing it will not improve it much beyond the
% limits imposed by the window size (you do gain a little bit,
% depending on the shape of your window, as the peak of the window
% slides over peaks in the signal energy).  The range 1-5 msec is good
% for speech.
% FFT length controls the vertical scale.  Selecting an FFT length
% greater than the window length does not add any information to the
% spectrum, but it is a good way to interpolate between frequency
% points which can make for prettier spectrograms.
% After you have generated the spectral slices, there are a number of
% decisions for displaying them.  First the phase information is
% discarded and the energy normalized:
%     S = abs(S); S = S/max(S(:));
% Then the dynamic range of the signal is chosen.  Since information in
% speech is well above the noise floor, it makes sense to eliminate any
% dynamic range at the bottom end.  This is done by taking the max of
% the magnitude and some minimum energy such as minE=-40dB. Similarly,
% there is not much information in the very top of the range, so
% clipping to a maximum energy such as maxE=-3dB makes sense:
%     S = max(S, 10^(minE/10)); S = min(S, 10^(maxE/10));
% The frequency range of the FFT is from 0 to the Nyquist frequency of
% one half the sampling rate.  If the signal of interest is band
% limited, you do not need to display the entire frequency range. In
% speech for example, most of the signal is below 4 kHz, so there is no
% reason to display up to the Nyquist frequency of 10 kHz for a 20 kHz
% sampling rate.  In this case you will want to keep only the first 40%
% of the rows of the returned S and f.  More generally, to display the
% frequency range [minF, maxF], you could use the following row index:
%     idx = (f >= minF & f <= maxF);
% Then there is the choice of colormap.  A brightness varying colormap
% such as copper or bone gives good shape to the ridges and valleys. A
% hue varying colormap such as jet or hsv gives an indication of the
% steepness of the slopes.  The final spectrogram is displayed in log
% energy scale and by convention has low frequencies on the bottom of
% the image:
%     imagesc(t, f, flipud(log(S(idx,:))));


This function calls: This function is called by:
Generated on Sat 16-May-2009 00:04:49 by m2html © 2003