人工智能

使用Azure的GPU系列虚拟机Ubuntu-16.0.4安装GPU驱动并使用Tensorflow-GPU的过程。

宋兴柱 · 2月19日 · 2019年 · 738次已读

1、source activate python36
2、source activate tensorflow-gpu
3、pip install tensorflow-gpu(提示安装的这个版本:tensorflow_gpu-1.12.0-cp36-cp36m-m)

4、查询GPU
from tensorflow.python.client import device_lib

def get_available_gpus():
“””
查看GPU的命令:nvidia-smi
查看被占用的情况:ps aux | grep PID
:return: GPU个数
“””
local_device_protos = device_lib.list_local_devices()
print “all: %s” % [x.name for x in local_device_protos]
print “gpu: %s” % [x.name for x in local_device_protos if x.device_type == ‘GPU’]

get_available_gpus()

报错提示ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory,因此需要安装cuda9

5、使用https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux下载。
命令如下:
cd /opt
wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run
sudo sh cuda_9.0.176_384.81_linux-run

安装位置:/usr/local/cuda-9.0
安装信息:
Linux platform:

/usr/local/cuda-#.#
Do you accept the previously read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit: n

Install the CUDA 9.0 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
[ default is /usr/local/cuda-9.0 ]:

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 9.0 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
[ default is /home/adai ]:

Installing the CUDA Toolkit in /usr/local/cuda-9.0 …

Installing the CUDA Toolkit in /usr/local/cuda-9.0 …
Installing the CUDA Samples in /home/adai …
Copying samples to /home/adai/NVIDIA_CUDA-9.0_Samples now…
Finished copying samples.

===========
= Summary =
===========

Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-9.0
Samples: Installed in /home/adai

Please make sure that
– PATH includes /usr/local/cuda-9.0/bin
– LD_LIBRARY_PATH includes /usr/local/cuda-9.0/lib64, or, add /usr/local/cuda-9.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.0/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.0/doc/pdf for detailed information on setting up CUDA.

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 9.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run -silent -driver

Logfile is /tmp/cuda_install_32689.log
Signal caught, cleaning up
(tensorflow-gpu) root@adailearninggpu:/opt#

6、执行步骤4测试列出GPU,这时提示:
libnvidia-fatbinaryloader.so.415.27: cannot open shared object file: No such file or directory

7、解决办法:下载https://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/415.27/NVIDIA-Linux-x86_64-415.27.run&lang=us&type=TITAN
执行:
cd /opt/
wget http://us.download.nvidia.com/XFree86/Linux-x86_64/415.27/NVIDIA-Linux-x86_64-415.27.run
chmod 777 NVIDIA-Linux-x86_64-415.27.run
./NVIDIA-Linux-x86_64-415.27.run
如果安装失败,则sudo apt-get –purge remove nvidia-*卸载原有Nvidia驱动。

 

8、修改/etc/profile,添加下列到末尾,添加后执行:source /etc/profile
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}:/usr/lib/nvidia-415/

 

9、测试第4步,成功时,会显示cpu、gpu设备。

0 条回应