Activités Laboratoire de PhonologieULB

 

VTI (Alain Soquet)

VTI is a software which allows building of synthetic sound from a sequence of frames. Each frame consists of the description of a speech segment in terms of duration, energy, fundamental frequency, and spectral properties specified either with formant frequencies and bandwidths, or with the area function of a vocal tract.

VTI

One of VTI main feature is the ability to go from the acoustic space (defined by the formant frequencies) to the articulatory space (defined in the DRM control parameters space [4]). This process is called acoustic-to-articulatory inversion [5]. Three different inversion methods are implemented:

  • Inversion by a connexionnist network [2][3][6][7]: connexionnist network can be trained to provide as output the control parameters of a vocal tract model when the formant frequencies are given as input. In VTI, three formant frequencies are used.
  • Inversion by table lookup [1][2][5]: a table contains couples of acoustic and articulatory data. The table is searched for the best match in the acoustic space between the target data and the acoustic data and the corresponding articulatory controls are output. In VTI, three formant frequencies are used as acoustic data, and the controls of the DRM model plus the total length of the tract are used as articulatory data.
  • Optimization [2][5]: Gradient descent can be used to modify the control parameters of the vocal tract in order to decrease the error between the target data and the formant frequencies obtained with the current configuration of the vocal tract.

VTI allows the use of optimization either from a starting vocal tract configuration obtained manually by direct control of the vocal tract parameters, or after one of the other inversion methods i.e. inversion by connexionnist network or inversion by table lookup.

VTI files

Installation of VTI is simple. Just copy the folder "VTI f" on your hard disk. Do not move any file out of this folder, or you may encounter some problems launching the application. The folder "VTI f" contains the following:

The "Tables" folder contains the tables available for inversion by table lookup, and the "Networks" folder contains the networks available for inversion by connexionnist network.

VTI

Using VTI

VTI implements balloon help. Don"t hesitate to work with balloon help turned on in order to obtain direct information on the program user interface.

Each VTI document consists of five windows. The main window displays informations about the selected frame, and ways to select a different frame. The four remaining windows display respectively, spectral enveloppe of the current frame, area function of the vocal tract of the current frame (if any), formant frequencies and bandwidths for each frames, and the synthesized signal.

  • Main window: Any control can be modified by typing the new value or by moving the corresponding slider (if any).

    VTI

  • Acoustic window: The acoustic window displays the spectral enveloppe computed from the formant frequencies and bandwidths in blue, and the spectral enveloppe computed from the vocal tract area function (if any) in red.

    VTI

  • Articulatory window: The articulatory window displays the area function (area in cm2 vs. distance from the glottis in cm).

    VTI

  • Formants window: The formants window displays formants frequencies and bandwidths for all frames. In red, values obtained with the vocal tract (if any), in blue, values obtained with formant description.

    VTI

  • Sound window: The sound window displays the synthesized signal.

    VTI

What you need to use VTI

To use VTI, you need these pieces of hardware and software:
  • a Power Macintosh computer
  • system software version 7.5 or later

A color screen is highly recommanded in order to help the readability of the graphics (256 colors is a good choice).

Feedback

This beta version of VTI has been sent to several laboratories. Any comments or bug reports are welcome. Please send them to me by E-mail, and I will try to reply to these comments and fix bugs.

[1] B. S. Atal, J. J. Chang, M. V. Mathews, and J. W. Tukey (1978). Inversion of articulatory-to-acoustic transformation in the vocal-tract by a computer-sorting technique. J. Acoust. Soc. Am. vol. 63. pages 1535-1555.

[2] P. Jospa, and A. Soquet (1994). The acoustic-articulatory mapping and the variational method. In the Proceedings of the ICSLP-94. Yokohama. Japan. pages 595-598.

[3] P. Jospa, A. Soquet, and M. Saerens (1994). Variational formulation of the acoustico-articulatory link and the inverse mapping by means of a neural network. In Levels in Speech Communication Relations and Interactions. Amsterdam: Elsevier. pages 103-113.

[4] M. Mrayati, R. Carré, and B. Guérin (1988). Distinctive regions and modes: a new theory of speech production. Speech Communication. vol. 7. pages 257-286.

[5] J. Schroeter, M. M. Sondhi (1994). Techniques for estimating vocal-tract shapes from the speech signal. IEEE Transactions on Speech and Audio Processing. vol. 2, n¡1. part II. pages 133-150.

[6] K. Shirai, and T. Kobayashi (1991). Estimation of articulatory motion using neural networks. Journal of Phonetics. vol. 19. pages 379-385.

[7] A. Soquet (1999). Etude comparée de représentations acoustiques et articulatoires du signal de parole pour le décodage acoustico-phonétique. Application à la classification de voyelles et à la détermination du lieu d"articulation des occlusives. PhD Thesis. Université Libre de Bruxelles.

 

Navigation Index Personnel Activités Matériel Publications Projets Démos ULB