vs-dpir update,...

Selur

Hi,

wollte nur mitteilen, dass HolyWu vs-dpir 2.0 (https://github.com/HolyWu/vs-dpir) rausgebracht hat.

Aber Achtung, dass Interface hat sich geändert:

Code

def DPIR(
 clip: vs.VideoNode,
 strength: Optional[float] = None,
 task: str = 'denoise',
 tile_w: int = 0,
 tile_h: int = 0,
 tile_pad: int = 8,
 provider: int = 1,
 device_id: int = 0,
 trt_fp16: bool = False,
 trt_engine_cache: bool = True,
 trt_engine_cache_path: str = dir_name,
 log_level: int = 2,
 ) -> vs.VideoNode:
 '''
 DPIR: Deep Plug-and-Play Image Restoration
 Parameters:
 clip: Clip to process. Only RGB and GRAY formats with float sample type of 32 bit depth are supported.
 strength: Strength for deblocking or denoising. Must be greater than 0. Defaults to 50.0 for 'deblock' task, 5.0 for 'denoise' task.
 task: Task to perform. Must be 'deblock' or 'denoise'.
 tile_w, tile_h: Tile width and height, respectively. As too large images result in the out of GPU memory issue, so this tile option will first crop
 input images into tiles, and then process each of them. Finally, they will be merged into one image. 0 denotes for do not use tile.
 tile_pad: The pad size for each tile, to remove border artifacts.
 provider: The hardware platform to execute on.
 0 = Default CPU
 1 = NVIDIA CUDA (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements)
 2 = NVIDIA TensorRT (https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#requirements)
 3 = DirectML (https://onnxruntime.ai/docs/execution-providers/DirectML-ExecutionProvider.html#requirements)
 device_id: The device ID.
 trt_fp16: Enable FP16 mode in TensorRT.
 trt_engine_cache: Enable TensorRT engine caching. The purpose of using engine caching is to save engine build time in the case that TensorRT may take long time to optimize and build engine. Engine will be cached when it's built for the first time so next time when new inference session is created the engine can be loaded directly from cache. In order to validate that the loaded engine is usable for current inference, engine profile is also cached and loaded along with engine. If current input shapes are in the range of the engine profile, the loaded engine can be safely used. Otherwise
 if input shapes are out of range, profile cache will be updated to cover the new shape and engine will be recreated based on the new profile (and also refreshed in the engine cache). Note each engine is created for specific settings such as model path/name, precision, workspace, profiles etc, and specific GPUs and it's not portable, so it's essential to make sure those settings are not changing, otherwise the engine needs to be rebuilt and cached again.
 Warning: Please clean up any old engine and profile cache files (.engine and .profile) if any of the following changes:
 Model changes (if there are any changes to the model topology, opset version, operators etc.)
 ORT version changes (i.e. moving from ORT version 1.8 to 1.9)
 TensorRT version changes (i.e. moving from TensorRT 7.0 to 8.0)
 Hardware changes (Engine and profile files are not portable and optimized for specific NVIDIA hardware)
 trt_engine_cache_path: Specify path for TensorRT engine and profile files if trt_engine_cache is true.
 log_level: Log severity level. Applies to session load, initialization, etc.
 0 = Verbose
 1 = Info
 2 = Warning
 3 = Error
 4 = Fatal
 '''

Alles anzeigen

Quelle: https://github.com/HolyWu/vs-dpir…pir/__init__.py

Mit provider=3 und onnxruntime-directml ist das plötzlich einiges flotter.

(Wenn DirectML auch bei BasicVSR++ und VSGAN einen solchen Geschwindigkeitsboost bekommt wäre das enorm cool. :))

Cu Selur

FatFaster

Hast du vsdpir und co eig auch portable hinbekommen?

Selur

Ja. Hab für Hybrid das Ganze in ein addOn gepackt. (kann man einfach in den Hybrid/64bit-Ordner entpacken)

Selur

btw. HolyWu hat eben ein vs-dpir-ncnn released

Code

python -m pip install -U vsdpir_ncnn
python -m pip install --upgrade https://github.com/HolyWu/ncnn/releases/download/1.0.20220910/ncnn-1.0.20220910-cp310-cp310-win_amd64.whl
python -m vsdpir_ncnn

usage:

Python

from vsdpir_ncnn import dpir


ret = dpir(clip)

options:

Code

def dpir(
clip: vs.VideoNode,
strength: float | vs.VideoNode | None = None,
task: str = 'denoise',
tile_w: int = 0,
tile_h: int = 0,
tile_pad: int = 8,
gpu_id: int | None = None,
fp16: bool = True,
) -> vs.VideoNode:
"""
DPIR: Deep Plug-and-Play Image Restoration
Parameters:
clip: Clip to process. Only RGB and GRAY formats with float sample type of 32 bit depth are supported.
strength: Strength for deblocking/denoising. Defaults to 50.0 for 'deblock', 5.0 for 'denoise'.
Also accepts a GRAY8/GRAYS clip for varying strength.
task: Task to perform. Must be 'deblock' or 'denoise'.
tile_w, tile_h: Tile width and height, respectively. As too large images result in the out of GPU memory issue,
so this tile option will first crop input images into tiles, and then process each of them.
Finally, they will be merged into one image. 0 denotes for do not use tile.
tile_pad: The pad size for each tile, to remove border artifacts.
gpu_id: The GPU ID.
fp16: Enable FP16 mode.
"""

Alles anzeigen

Auf meiner Geforce GTX 1070ti ist die nicht-ncnn Version flotter, sprich ich werde die vs-dpir-ncnn Version wohl erstmal nicht nutzen oder in Hybrid integrieren.

vs-dpir:

Code

clip = DPIR(clip=clip, strength=5.000, task="denoise", provider=1, device_id=0, dual=True)

Output 429 frames in 96.04 seconds (4.47 fps)

vs-dpir-ncnn:

Code

clip = dpir(clip=clip, strength=5.000, task="denoise", gpu_id=0)

Output 429 frames in 117.23 seconds (3.66 fps)

Selur

update von vs-dpir-ncnn:

Zitat

v2.0.0
Repository: HolyWu/vs-dpir-ncnn · Tag: v2.0.0 · Commit: 0938c18 · Released by: HolyWu
Turn dpir function into a class so as to separate model initialization and inference. It's more memory friendly when you run the same task more than once (such as on different clips or with different strengths) since the same model will be initialized only once.

Cu Selur

Jetzt mitmachen!