Hi Guys,
Do you have an issue with CUDA compatibility? And trying to resolve this issue? Well, maybe this post is right for you.
Today I’m gonna share my experience on how do I resolve my problem installing CUDA/GPU + Tensorflow + Python on my Linux Machine, Ubuntu 20.04.
I’ve been trying and searching on the Internet.
I have my NVIDIA installed on my machine.
So, if I check my VGA by using the command “lspci”, I got my NVIDIA driver installed correctly.
abdusy@troiz:~$ lspci | grep VGA
06:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)
And to verify that everything that related to NVidia, CUDA, I check with command “nvidia-smi”.
abdusy@troiz:~$ nvidia-smi
Thu Jul 6 13:36:45 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1070 On | 00000000:06:00.0 Off | N/A |
| 31% 50C P0 35W / 180W| 264MiB / 8192MiB | 6% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 23222 G /usr/lib/xorg/Xorg 143MiB |
| 0 N/A N/A 23780 G gala 48MiB |
| 0 N/A N/A 23914 G io.elementary.tasks 2MiB |
| 0 N/A N/A 26452 G /home/abdusy/anaconda3/bin/python 2MiB |
| 0 N/A N/A 28834 G ...8933804,10335128977712337494,262144 61MiB |
+---------------------------------------------------------------------------------------+
But, when I check my tensorflow with the script below :
import tensorflow as tf
print("Tensorflow version:", tf.__version__)
# Check if GPU is available
if tf.test.is_gpu_available():
print('CUDA is available! Using GPU for TensorFlow.')
else:
print('CUDA is not available. Using CPU for TensorFlow.')
It gives “CUDA is not available”.
Tensorflow version: 2.10.0
CUDA is not available. Using CPU for TensorFlow.
However, when I used another library, PyTorch, and I check with this script:
import torch
print("Torch version:",torch.__version__)
# Check if CUDA is available
if torch.cuda.is_available():
# Set the default device to GPU
device = torch.device('cuda')
print('CUDA is available! Using GPU for computations.')
else:
# Set the default device to CPU
device = torch.device('cpu')
print('CUDA is not available. Using CPU for computations.')
and it shows that my CUDA is available.
Torch version: 2.0.1+cu117
CUDA is available! Using GPU for computations.
So, it means that I have a CUDA compatibility issue on my machine.
If you read this website (https://docs.nvidia.com/deploy/cuda-compatibility/index.html) about CUDA compatibility carefully, you will find all information that describes everything on CUDA.
So, how to resolve this issue? Well, first of all, I was thinking to use docker which is already build-in. Just to make it simple. Because I love everything that simple :).
Then I found this website (https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow) that provides TensorFlow containers with comprehensive tools and libraries in a flexible architecture allowing easy deployment across a variety of platforms and devices.
So, you guys, just go run this command:
nvidia-docker run -it --rm nvcr.io/nvidia/tensorflow:xx.xx-tfx-py3
In my case I use “nvidia-docker run -it –rm nvcr.io/nvidia/tensorflow:22.01-tf2-py3”, because I’m using TensorFlow v2 and Python 3.9.
After you finished cloning this docker, then you can run it with this command :
docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/tensorflow:xx.xx-tfx-py3
In my case, I run a command like this “docker run –network host –gpus all -it –rm -v /home/abdusy:/workspace/my_ml nvcr.io/nvidia/tensorflow:22.01-tf2-py3”. This is because I want to use my network host, gpu, and my documents from this docker.
Now, you have CUDA/GPU avaliable on your machine… !!!
Voila….!
Colmar, 06 July 2023 (Summer Time)
Wow, superb blog structure! How long have you ever been blogging for? you make blogging look easy. The entire glance of your site is great, let alone the content material!