In one of our recent blog posts we compared NVIDIA’s Jetson Nano to the Coral USB Accelerator from Google. Recently NVIDIA released a new member of their Jetson family – the NVIDIA Jetson Xavier NX. This promising product extents the high-end range of AI computers for edge applications. Also at the same price point as Jetson TX2, Xavier NX seems to make ists older brethren practically obsolete. In this post we will have a closer look at the Jetson Xavier NX, compare it to its predecessor – the Jetson TX2 – and make a comparison of their performance to see how capable the new Jetson is.
NVIDIA’s Jetson Xavier NX System-on-Module (SoM) is currently available in two flavours: as a Developer Kit (945-83518-0005-000) for around 375€ and as module-only (900-83668-0000-000) for around 433€, which can be integrated into custom products. The Developer Kit provides all necessary ports, like HDMI, DisplayPort, USB 3.1 and Gigabit Ethernet. It also features two MIPI CSI-2 ports for cameras and two M.2 interfaces for wireless cards and fast NVME storage expansion.
There are also two variants of the SoM: On the Devkit version storage is provided by installing a microSD card of your choice. The production module comes with 16GB of integrated eMMC and no SD card slot. The module can also be used in conjunction with newer B01 revision of the Jetson Nano Developer Kit (945-13450-0000-100) – as used for this review, since the official Devkit was not available at the time of testing.
Similarly, Jetson TX2 is available both as a Developer Kit and a module for around 432€. The Developer Kit’s ports are similar to the Xavier NX, but the whole board is much larger than the one used on Jetson Xavier NX. Storage can be provided by installing a full-sized SD card. The System-on-module comes with 32GB eMMC.
|Jetson Xavier NX
|CPU Complex||Quad-Core ARM® Cortex® -A57 MPCore, 2 MB L2, Maximum Operating Frequency: 1.43 GHz||6-core NVIDIA Carmel ARM®v8.2 64-bit CPU, 6 MB L2 + 4 MB L3, Maximum Operating Frequency: 1.9 GHz||Quad-Core ARM® Cortex®-A57 MPCore, 2 MB L2, Maximum Operating Frequency: 2.0 GHz
Dual-Core NVIDIA Denver 2 64-Bit CPU, 2 MB L2, Maximum Operating Frequency: 2.0 GHz
|GPU||128-core Maxwell GPU, 512 GFLOPS (FP16), Maximum Operating Frequency: 921 MHz||384 CUDA® cores + 48 Tensor cores Volta GPU, 21 TOPS, Maximum Operating Frequency: 1100 MHz||256-core Pascal GPU, 1.3 TFLOPS, Maximum Operating Frequency 1.12GHz|
|RAM||4 GB 64-bit LPDDR4 @ 1600MHz | 25.6 GB/s||8 GB 128-bit LPDDR4x @ 1600MHz | 51.2GB/s||8 GB 128-bit LPDDR4 @ 1866Mhz | 58.3 GB/s|
|On-Module Storage||16 GB eMMC 5.1 Flash Storage, Bus Width: 8-bit, Maximum Bus Frequency: 200 MHz (HS400)||32 GB eMMC 5.1 Flash Storage, Bus Width: 8-bit, Maximum Bus Frequency: 200MHz (HS400)|
|Camera||12 lanes MIPI CSI-2 | 1.5 Gbps per lane||14 lanes MIPI CSI-2 | 2.5 Gbps per lane||12 lanes MIPI CSI-2 | 2.5 Gbps per lane|
|Voltage||5V||5.5V – 19.6V|
|TDP||5W – 10W||10W – 15W||7.5W – 15W|
|Temp. Range||-25°C – 97°C||-25°C – 90°C||-25°C – 80°C|
|Module Size||69.6 mm x 45.0 mm||87.0 mm x 50.0 mm|
|Price||140.58 EUR||433.07 EUR||432.12 EUR|
Running MobileNet SSD v2 on NVIDIA Jetson
[Skip to the following section if you are just interested in the results.]
In order to compare the performance of the Jetson modules, we’re using MobileNet SSD v2 object detector from the official TensorFlow model zoo as a benchmark. We optimized it with TensorRT, as described in our previous blog post. But first, we have to make our devices ready for object detection.
Note: This guide applies to JetPack version 4.4.
To set up our Jetson for the first time, we start by following the setup guide from NVIDIA. After completing the guide, we can install the tools we need to run MobileNet.
$ sudo apt-get install python3-pip libhdf5-serial-dev hdf5-tools $ pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v44 tensorflow==1.15.2+nv20.4 --user $ pip3 install numpy pycuda --user
To optimize the MobileNet SSD v2 TensorFlow model with TensorRT, we require a TensorRT plugin called “FlattenConcat”. First, go to /usr/src/tensorrt/samples/python/uff_ssd. Unfortunately, the Developer Preview of the JetPack 4.4 is missing some files here. If you find the CMakeLists.txt and the plugin folder containing the FlattenConcat.cpp, you are good to go. If not, you can clone the complete python samples from here. Next, we execute the following commands to generate the FlattenConcat-library.
mkdir build cd build cmake .. make
The libflattenconcat.so is now in our build folder, we will need it after the next step.
Now we clone the object detection repository from NVIDIA’s employee AastaNV to our Jetson and download the MobileNet SSD v2 model from the TensorFlow model zoo.
$ git clone https://github.com/AastaNV/TRT_object_detection.git $ cd TRT_object_detection/ $ mkdir model && cd model/ $ wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz $ tar zxvf ssd_mobilenet_v2_coco_2018_03_29.tar.gz
Next, we copy the previously generated libflattenconcat.so to TRT_object_detection/lib/ and overwrite the existing library, which was not compiled for our system.
In addition, we have to apply a small fix to the graphsurgeon converter so we can parse our model. Open the node_manipulation.py, which should be in /usr/lib/python3.6/dist-packages/graphsurgeon/ with an editor of your choice and add the bold line of code to the update_node-function as shown below:
node.name = name or node.name node.op = op or node.op or node.name node.attr["dtype"].type = 1 for key, val in kwargs.items(): ...
Now we can open the main.py and add the following line at the top of the script, so it knows which model we want to use:
from config import model_ssd_mobilenet_v2_coco_2018_03_29 as model
Finally we can run the object detection as follows:
$ python3 main.py [image]
Running the script the first time may take a couple of minutes because the model has to be optimized and converted into the TensorRT format, but after that it should be done in a few seconds.
If you encounter the error “[TensorRT] ERROR: Could not register plugin creator: FlattenConcat_TRT in namespace” when executing the script, you can apply a workaround to fix it. According to NVIDIA, this is a known issue and should be fixed in a future version of TensorRT.
Since Jetson Xavier NX and the Jetson TX2 are priced very similarly, we are keen to know how their performance compares to each other when they run object detection on images.
We used the “2017 Val images” COCO-dataset, which consists of 5000 images of “common objects in context” for our benchmark. We compared the two devices in terms of speed and accuracy. First, we measured the time to perform object detection on each of our 5000 images, then we calculated the average time per image to get the frames per second (FPS). To get the accuracy of our object detection, we calculated the “mean average precision” (mAP) over all 80 classes in the COCO dataset.
As depicted in the chart below you can see the FPS as the blue bars and the mAP as the green bars. The two values of the low-priced Jetson Nano are taken from our last benchmark to put the more expensive Jetson TX2 and Jetson Xavier NX into perspective.
As expected, the mAP is nearly the same on all three devices, since we ran the same object detector under equal conditions. When looking at the FPS numbers, the Xavier NX runs at 74 Hz, which is 57% faster compared to the TX2.
Even though this result looks promising, it is probably not the peak performance of the Xavier NX, since we conducted the object detection with FP16 precision. In contrast to the TX2, the Xavier NX now supports INT8-quantization, which should improve the performance even more. However, this will be the subject of a future update to our Jetson benchmark series.
The new Jetson Xavier NX is definitely a worthy successor to the Jetson TX2. With much more computational power for the same price you get much more bang for your buck. This is practically making TX2 obsolete. However, as adaption of new platforms takes a while, you will see many products featuring the older Jetson TX2 for while – especially since it officially supported by NVIDIA until 2025.
If you are building a new product and you are looking for a small device to run state-of-the-art AI applications, you should definitely give the new Jetson Xavier NX a try. The newer platform provides major increases in speed and efficiency as well as some minor I/O upgrades, like 10 Gbit/s USB 3.1 or PCIe Gen4. A major difference for some is the omission of the integrated wireless modem in the newer platform. While this forces those who need WLAN to add an additional modem to their design, it also heavily reduces the effort and costs associated with conformity requirements that were an issue with building TX2 products.