wollte nur mitteilen, dass HolyWu vs-dpir 2.0 (https://github.com/HolyWu/vs-dpir) rausgebracht hat.
Aber Achtung, dass Interface hat sich geändert:
- def DPIR(
- clip: vs.VideoNode,
- strength: Optional[float] = None,
- task: str = 'denoise',
- tile_w: int = 0,
- tile_h: int = 0,
- tile_pad: int = 8,
- provider: int = 1,
- device_id: int = 0,
- trt_fp16: bool = False,
- trt_engine_cache: bool = True,
- trt_engine_cache_path: str = dir_name,
- log_level: int = 2,
- ) -> vs.VideoNode:
- DPIR: Deep Plug-and-Play Image Restoration
- clip: Clip to process. Only RGB and GRAY formats with float sample type of 32 bit depth are supported.
- strength: Strength for deblocking or denoising. Must be greater than 0. Defaults to 50.0 for 'deblock' task, 5.0 for 'denoise' task.
- task: Task to perform. Must be 'deblock' or 'denoise'.
- tile_w, tile_h: Tile width and height, respectively. As too large images result in the out of GPU memory issue, so this tile option will first crop
- input images into tiles, and then process each of them. Finally, they will be merged into one image. 0 denotes for do not use tile.
- tile_pad: The pad size for each tile, to remove border artifacts.
- provider: The hardware platform to execute on.
- 0 = Default CPU
- 1 = NVIDIA CUDA (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements)
- 2 = NVIDIA TensorRT (https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#requirements)
- 3 = DirectML (https://onnxruntime.ai/docs/execution-providers/DirectML-ExecutionProvider.html#requirements)
- device_id: The device ID.
- trt_fp16: Enable FP16 mode in TensorRT.
- trt_engine_cache: Enable TensorRT engine caching. The purpose of using engine caching is to save engine build time in the case that TensorRT may take long time to optimize and build engine. Engine will be cached when it's built for the first time so next time when new inference session is created the engine can be loaded directly from cache. In order to validate that the loaded engine is usable for current inference, engine profile is also cached and loaded along with engine. If current input shapes are in the range of the engine profile, the loaded engine can be safely used. Otherwise
- if input shapes are out of range, profile cache will be updated to cover the new shape and engine will be recreated based on the new profile (and also refreshed in the engine cache). Note each engine is created for specific settings such as model path/name, precision, workspace, profiles etc, and specific GPUs and it's not portable, so it's essential to make sure those settings are not changing, otherwise the engine needs to be rebuilt and cached again.
- Warning: Please clean up any old engine and profile cache files (.engine and .profile) if any of the following changes:
- Model changes (if there are any changes to the model topology, opset version, operators etc.)
- ORT version changes (i.e. moving from ORT version 1.8 to 1.9)
- TensorRT version changes (i.e. moving from TensorRT 7.0 to 8.0)
- Hardware changes (Engine and profile files are not portable and optimized for specific NVIDIA hardware)
- trt_engine_cache_path: Specify path for TensorRT engine and profile files if trt_engine_cache is true.
- log_level: Log severity level. Applies to session load, initialization, etc.
- 0 = Verbose
- 1 = Info
- 2 = Warning
- 3 = Error
- 4 = Fatal
Mit provider=3 und onnxruntime-directml ist das plötzlich einiges flotter.
(Wenn DirectML auch bei BasicVSR++ und VSGAN einen solchen Geschwindigkeitsboost bekommt wäre das enorm cool. :))