Anaconda安装tensorflow-gpu后,提示libcuda.so.1缺失

wkyo python
python - anaconda - cuda

Anaconda在安装tensorflow-gpu后自动安装了cudacudnn等依赖库,运行tf.config.get_visible_devices()提示libcuda.so.1缺失。

import tensorflow as tf
print(tf.config.get_visible_devices())
2020-08-21 18:40:57.436195: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-08-21 18:40:57.436278: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)
2020-08-21 18:40:57.436342: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: kyobian.localhost
2020-08-21 18:40:57.436371: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: kyobian.localhost
2020-08-21 18:40:57.436496: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
2020-08-21 18:40:57.436609: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 418.152.0

[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]

查询虚拟环境的lib目录,发现libcuda.so.1确实没有,同时cuda所下载的cudatoolkit、cudnn、cupti包中均没有libcuda.so.1文件。

$ ls ~/anaconda3/envs/py37-sci-202008/lib/ | grep cuda

cudatoolkit_config.yaml
libcudart.so
libcudart.so.10.1
libcudart.so.10.1.243

猜测libcuda.so.1文件可能在Nvidia的相关驱动包中,通过apt-file命令查询libcuda.so.1文件。

$ apt-file search -x '/libcuda.so.1$'
libcuda1: /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.1
libnvidia-legacy-340xx-cuda1: /usr/lib/x86_64-linux-gnu/nvidia/legacy-340xx/libcuda.so.1
libnvidia-legacy-390xx-cuda1: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libcuda.so.1
libnvidia-tesla-418-cuda1: /usr/lib/x86_64-linux-gnu/nvidia/tesla-418/libcuda.so.1

查看libcuda1详细信息,可以看到,这个包提供了CUDA驱动接口,即libcuda.so.1

$ apt show libcuda1

Package: libcuda1
Version: 418.152.00-1
...
Provides: libcuda-10.0-1, libcuda-10.1-1, libcuda-5.0-1, libcuda-5.5-1, libcuda-6.0-1, libcuda-6.5-1, libcuda-7.0-1, libcuda-7.5-1, libcuda-8.0-1, libcuda-9.0-1, libcuda-9.1-1, libcuda-9.2-1, libcuda.so.1 (= 418.152.00), libcuda1-any
...
Description: NVIDIA CUDA Driver Library
 The Compute Unified Device Architecture (CUDA) enables NVIDIA
 graphics processing units (GPUs) to be used for massively parallel
 general purpose computation.
 .
 This package contains the CUDA Driver API library for low-level CUDA
 programming.
 .
 Supported NVIDIA devices include GPUs starting from GeForce 8 and Quadro FX
 series, as well as the Tesla computing processors.
 .
 Please see the nvidia-kernel-dkms or
 nvidia-kernel-source packages
 for building the kernel module required by this package.
 This will provide nvidia-kernel-418.152.00.

安装libcuda1后,libcuda.so.1出现在/usr/lib/x86_64-linux-gnu中。

$ sudo apt install libcuda1

$ ls /usr/lib/x86_64-linux-gnu | grep cuda.so
libcuda.so
libcuda.so.1

运行tf.config.get_visible_devices()可以看到GPU被正常加载。

$ python -c "import tensorflow as tf;print(tf.config.get_visible_devices())"

2020-08-21 21:39:58.646031: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-08-21 21:39:58.661044: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-21 21:39:58.661860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: Quadro P2000 computeCapability: 6.1
coreClock: 1.607GHz coreCount: 6 deviceMemorySize: 3.95GiB deviceMemoryBandwidth: 89.53GiB/s
2020-08-21 21:39:58.661989: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-08-21 21:39:58.663116: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-08-21 21:39:58.664140: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-08-21 21:39:58.664302: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-08-21 21:39:58.665461: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-08-21 21:39:58.666440: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-08-21 21:39:58.669539: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-21 21:39:58.669664: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-21 21:39:58.670704: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-21 21:39:58.671454: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]