Différences

Ci-dessous, les différences entre deux révisions de la page.

Lien vers cette vue comparative

Les deux révisions précédentes Révision précédente
Prochaine révision
Révision précédente
ressource:logiciel:vosk:start [2021/12/29 23:26]
gweltaz [Création des fichiers du dossier 'data/train']
ressource:logiciel:vosk:start [2022/03/06 23:06] (Version actuelle)
gweltaz
Ligne 169: Ligne 169:
 Les instruction pour l'​installation sont dans le fichier ''​tools/​INSTALL''​ Les instruction pour l'​installation sont dans le fichier ''​tools/​INSTALL''​
  
-Cloner le répo de Kaldi : +Cloner le répo de Kaldi : https://​github.com/​kaldi-asr/​kaldi 
-https://​github.com/​kaldi-asr/​kaldi+  $ git clone https://​github.com/​kaldi-asr/​kaldi
  
 Vérifier les dépendances : Vérifier les dépendances :
Ligne 181: Ligne 181:
 Installation de Intel Math Kernel Library (optimisation des opération d'​algèbre linéaire) : Installation de Intel Math Kernel Library (optimisation des opération d'​algèbre linéaire) :
   $ sudo ./​tools/​extra/​install_mkl.sh   $ sudo ./​tools/​extra/​install_mkl.sh
 +
 +Installation de SRILM (outil pour la création de modèles de langages)
 +  $ ./​tools/​install_srilm.sh
  
 Installation de kaldi : Installation de kaldi :
Ligne 203: Ligne 206:
  
 ==== Traitement des fichiers son ==== ==== Traitement des fichiers son ====
 +
 +Conversion en wav mono 16 bits et avec une fréquence d’échantillonnage de 16000 Hz
 +
 +  $ ffmpeg -i in.mp3 -acodec pcm_s16le -ac 1 -ar 16000 out.wav
 +
  
 Détection des silences et des non silences avec Python Détection des silences et des non silences avec Python
Ligne 328: Ligne 336:
 Une fois les fichiers crées, lancer la commande : Une fois les fichiers crées, lancer la commande :
   $ utils/​prepare_lang.sh data/​local/​dict '<​UNK>'​ data/​local/​lang data/lang   $ utils/​prepare_lang.sh data/​local/​dict '<​UNK>'​ data/​local/​lang data/lang
 +
 +==== A propos des mots inconnus ====
 +This is an explanation of how Kaldi deals with unknown words (words not in the vocabulary);​ we are putting it on the "data preparation"​ page for lack of a more obvious location.
 +
 +In many setups, <unk> or something similar will be present in the LM as long as the data that you used to train the LM had words that were not in the vocabulary you used to train the LM, because language modeling toolkits tend to map those all to a single special world, usually called <unk> or <​UNK>​. You can look at the arpa file to figure out what it's called; it will usually be one of those two.
 +
 +During training, if there are words in the text file in your data directory that are not in the words.txt in the lang directory that you are using, Kaldi will map them to a special word that's specified in the lang directory in the file data/​lang/​oov.txt;​ it will usually be either <​unk>,​ <UNK> or maybe <​SPOKEN_NOISE>​. This word will have been chosen by the user (i.e., you), and supplied to prepare_lang.sh as a command-line argument. If this word has nonzero probability in the language model (which you can test by looking at the arpa file), then it will be possible for Kaldi to recognize this word in test time. This will often be the case if you call this word <​unk>,​ because as we mentioned above, language modeling toolkits will often use this spelling for ''​unknown word''​ (which is a special word that all out-of-vocabulary words get mapped to). Decoding output will always be limited to the intersection of the words in the language model with the words in the lexicon.txt (or whatever file format you supplied the lexicon in, e.g. lexicop.txt);​ these words will all be present in the words.txt in your lang directory. So if Kaldi'​s "​unknown word" doesn'​t match the LM's "​unknown word", you will simply never decode this word. In any case, even when allowed to be decoded, this word typically won't be output very often and in practice it doesn'​t tend to have much impact on WERs.
 +
 +Of course a single phone isn't a very good, or accurate, model of OOV words. In some Kaldi setups we have example scripts with names local/​run_unk_model.sh:​ e.g., see the file tedlium/​s5_r2/​local/​run_unk_model.sh. These scripts replace the unk phone with a phone-level LM on phones. They make it possible to get access to the sequence of phones in a hypothesized unknown word. Note: unknown words should be considered an "​advanced topic" in speech recognition and we discourage beginners from looking into this topic too closely. ​
  
 ==== Modèles basés sur un réseau neuronal profond ==== ==== Modèles basés sur un réseau neuronal profond ====
  • ressource/logiciel/vosk/start.1640816787.txt.gz
  • Dernière modification: 2021/12/29 23:26
  • par gweltaz