Skip to content

[SOLVED] – CUDA Compatibility with Tensorflow and Python

Hi Guys,
Do you have an issue with CUDA compatibility? And trying to resolve this issue? Well, maybe this post is right for you.
Today I’m gonna share my experience on how do I resolve my problem installing CUDA/GPU + Tensorflow + Python on my Linux Machine, Ubuntu 20.04.
I’ve been trying and searching on the Internet.
I have my NVIDIA installed on my machine.
So, if I check my VGA by using the command “lspci”, I got my NVIDIA driver installed correctly.

abdusy@troiz:~$ lspci | grep VGA
06:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)

And to verify that everything that related to NVidia, CUDA, I check with command “nvidia-smi”.

abdusy@troiz:~$ nvidia-smi 
Thu Jul  6 13:36:45 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1070         On | 00000000:06:00.0 Off |                  N/A |
| 31%   50C    P0               35W / 180W|    264MiB /  8192MiB |      6%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A     23222      G   /usr/lib/xorg/Xorg                          143MiB |
|    0   N/A  N/A     23780      G   gala                                         48MiB |
|    0   N/A  N/A     23914      G   io.elementary.tasks                           2MiB |
|    0   N/A  N/A     26452      G   /home/abdusy/anaconda3/bin/python             2MiB |
|    0   N/A  N/A     28834      G   ...8933804,10335128977712337494,262144       61MiB |
+---------------------------------------------------------------------------------------+

But, when I check my tensorflow with the script below :

import tensorflow as tf

print("Tensorflow version:", tf.__version__)
# Check if GPU is available
if tf.test.is_gpu_available():
    print('CUDA is available! Using GPU for TensorFlow.')
else:
    print('CUDA is not available. Using CPU for TensorFlow.')

It gives “CUDA is not available”.

Tensorflow version: 2.10.0
CUDA is not available. Using CPU for TensorFlow.


However, when I used another library, PyTorch, and I check with this script:

import torch
print("Torch version:",torch.__version__)
# Check if CUDA is available
if torch.cuda.is_available():
    # Set the default device to GPU
    device = torch.device('cuda')
    print('CUDA is available! Using GPU for computations.')
else:
    # Set the default device to CPU
    device = torch.device('cpu')
    print('CUDA is not available. Using CPU for computations.')

and it shows that my CUDA is available.

Torch version: 2.0.1+cu117
CUDA is available! Using GPU for computations.


So, it means that I have a CUDA compatibility issue on my machine.

SOLUTION

How to solve this problem

If you read this website (https://docs.nvidia.com/deploy/cuda-compatibility/index.html) about CUDA compatibility carefully, you will find all information that describes everything on CUDA.

So, how to resolve this issue? Well, first of all, I was thinking to use docker which is already build-in. Just to make it simple. Because I love everything that simple :).

Then I found this website (https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow) that provides TensorFlow containers with comprehensive tools and libraries in a flexible architecture allowing easy deployment across a variety of platforms and devices.

So, you guys, just go run this command:

nvidia-docker run -it --rm nvcr.io/nvidia/tensorflow:xx.xx-tfx-py3

In my case I use “nvidia-docker run -it –rm nvcr.io/nvidia/tensorflow:22.01-tf2-py3”, because I’m using TensorFlow v2 and Python 3.9.

After you finished cloning this docker, then you can run it with this command :

docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/tensorflow:xx.xx-tfx-py3

In my case, I run a command like this “docker run –network host –gpus all -it –rm -v /home/abdusy:/workspace/my_ml nvcr.io/nvidia/tensorflow:22.01-tf2-py3”. This is because I want to use my network host, gpu, and my documents from this docker.

Now, you have CUDA/GPU avaliable on your machine… !!!

Voila….!

Colmar, 06 July 2023 (Summer Time)

Leave a Reply

Your email address will not be published. Required fields are marked *