Anaconda在安装tensorflow-gpu后自动安装了cuda
、cudnn
等依赖库,运行tf.config.get_visible_devices()
提示libcuda.so.1
缺失。
import tensorflow as tf
print(tf.config.get_visible_devices())
2020-08-21 18:40:57.436195: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-08-21 18:40:57.436278: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)
2020-08-21 18:40:57.436342: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: kyobian.localhost
2020-08-21 18:40:57.436371: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: kyobian.localhost
2020-08-21 18:40:57.436496: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
2020-08-21 18:40:57.436609: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 418.152.0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
查询虚拟环境的lib目录,发现libcuda.so.1
确实没有,同时cuda所下载的cudatoolkit、cudnn、cupti包中均没有libcuda.so.1
文件。
$ ls ~/anaconda3/envs/py37-sci-202008/lib/ | grep cuda
cudatoolkit_config.yaml
libcudart.so
libcudart.so.10.1
libcudart.so.10.1.243
猜测libcuda.so.1
文件可能在Nvidia的相关驱动包中,通过apt-file
命令查询libcuda.so.1
文件。
$ apt-file search -x '/libcuda.so.1$'
libcuda1: /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.1
libnvidia-legacy-340xx-cuda1: /usr/lib/x86_64-linux-gnu/nvidia/legacy-340xx/libcuda.so.1
libnvidia-legacy-390xx-cuda1: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libcuda.so.1
libnvidia-tesla-418-cuda1: /usr/lib/x86_64-linux-gnu/nvidia/tesla-418/libcuda.so.1
查看libcuda1详细信息,可以看到,这个包提供了CUDA驱动接口,即libcuda.so.1
。
$ apt show libcuda1
Package: libcuda1
Version: 418.152.00-1
...
Provides: libcuda-10.0-1, libcuda-10.1-1, libcuda-5.0-1, libcuda-5.5-1, libcuda-6.0-1, libcuda-6.5-1, libcuda-7.0-1, libcuda-7.5-1, libcuda-8.0-1, libcuda-9.0-1, libcuda-9.1-1, libcuda-9.2-1, libcuda.so.1 (= 418.152.00), libcuda1-any
...
Description: NVIDIA CUDA Driver Library
The Compute Unified Device Architecture (CUDA) enables NVIDIA
graphics processing units (GPUs) to be used for massively parallel
general purpose computation.
.
This package contains the CUDA Driver API library for low-level CUDA
programming.
.
Supported NVIDIA devices include GPUs starting from GeForce 8 and Quadro FX
series, as well as the Tesla computing processors.
.
Please see the nvidia-kernel-dkms or
nvidia-kernel-source packages
for building the kernel module required by this package.
This will provide nvidia-kernel-418.152.00.
安装libcuda1
后,libcuda.so.1
出现在/usr/lib/x86_64-linux-gnu
中。
$ sudo apt install libcuda1
$ ls /usr/lib/x86_64-linux-gnu | grep cuda.so
libcuda.so
libcuda.so.1
运行tf.config.get_visible_devices()
可以看到GPU被正常加载。
$ python -c "import tensorflow as tf;print(tf.config.get_visible_devices())"
2020-08-21 21:39:58.646031: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-08-21 21:39:58.661044: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-21 21:39:58.661860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: Quadro P2000 computeCapability: 6.1
coreClock: 1.607GHz coreCount: 6 deviceMemorySize: 3.95GiB deviceMemoryBandwidth: 89.53GiB/s
2020-08-21 21:39:58.661989: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-08-21 21:39:58.663116: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-08-21 21:39:58.664140: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-08-21 21:39:58.664302: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-08-21 21:39:58.665461: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-08-21 21:39:58.666440: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-08-21 21:39:58.669539: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-21 21:39:58.669664: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-21 21:39:58.670704: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-21 21:39:58.671454: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]