小毕设
0x00 Speech Enhancement
Speech Enhancement is a signal processing task that involves improving the quality of speech signals captured under noisy or degraded conditions. The goal of speech enhancement is to make speech signals clearer, more intelligible, and more pleasant to listen to, which can be used for various applications such as voice recognition, teleconferencing, and hearing aids.
0x01 Preparation
🔨CUDA & CUDNN
①What is CUDA and CUDNN ?
②Install CUDA , CUDNN
🔨Pytorch
①What is Pytorch ?
②Installing Pytorch , you need these
③Make good use of Pytorch
✏VSCode & Display Card
①Connect Display Card with VSCode
②How to use
💻Coding and Debug
①Datasets
②Data Generation
③Necessary parameters
④Model
⑤Train and Evaluate
0x02 CUDA and CUDNN
CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
For example, the speed of using GPUs is almost 60 times quicker than the speed of using CPUs in our following Engineering Project(Speench Enhancement Based on GAN).
👉Attention: Not all computers can install CUDA
Then how can we know that?
Open your Windows Device Manager, find the graphics adapter and see whether you have an Nvidia Gaphics Card.
As for mine, it is RTX 2060.😥
0x03 Pytorch
PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella.
0x04 GAN
Strongly Related Papers:
segan:基于时域信号的增强
0x05 Main Structure of Our Work
1 | /mnt |
0x06 Coding Analysis
coding_test.py
1 | from pesq import pesq |
Conclusion:
1 | 训练10轮 |
dataset.py
🔨Necessary modules
1 | import os |
🔨Pre-emphasis and Anti-pre-emphasis
1 | def emphasis(signal, emph_coeff=0.95, pre=True): |
🔨SEGAN_Dataset
1 | class SEGAN_Dataset(Dataset): |
data_generation.py
1 | import numpy as np |
What is “batch” ?
If you are doing training, batches can help you shorten training times since you will be processing multiple images at once and updating the network according to the loss in all of them. This way, the network may be able to update itself better according to the task since it will be looking at multiple examples of the input at the same time. Here is a nice article that may help you understand why using batches bigger than one may be beneficial when training NNs. TL;DR: Batch normalization, or batchnorm for short, is proposed as a technique to help coordinate the update of multiple layers in the model.
train.py
🔨Necessary modules
1 | import torch |
🔨 if __name\ == “__main__”:
1 | # 定义device |
在训练过程中,函数使用RMSprop优化器对生成器和鉴别器进行优化,并使用L2和L1损失函数进行模型训练。最后,函数保存训练好的生成器和鉴别器模型。
eval.py
🔨Necessary modules
1 | import torch |
1 | def enh_segan(model,noisy,para): |
0x07 Loss Function
Understanding GANs — Deriving the Adversarial loss from scratch
损失函数:度量模型的预测值
0x08 Future?
Speech Enhancment会梦见LLM吗?