NVIDIA GeForce GTX 780

Research: GeForce GTX780 to Tesla K20

In 2011, ijsf posted the results of his research to turn a NVIDIA GeForce GTX480 into a much more expensive NVIDIA Tesla C2050. In short, the NVIDIA driver checks the PCI Express device ID of the card which denotes which type of card it is. The device ID is determined by a set of soft straps that are stored in the firmware of the board. NVIDIA learned from this and changed the way how cards are detected for the GeForce 600 and 700 series by partially hard strapping the device ID. This means that it is no longer possible to modify the entire Device ID by changing the firmware, and hardware modifications are now a necessity. User gnif posted how he turned a GeForce GTX690 into a Quadro K5000, by modifying, adding, and removing several resistors on the board. In that topic, other people also managed to convert Fermi GF1xx and Kepler GK10x based graphics cards to more expensive, professional cards, but no one has managed to convert the GeForce GTX780 or GTX Titan based on the Kepler GK11x GPU.

This sparked my interest to see if it is possible to modify the newest NVIDIA cards, so I bought a GTX780 and started researching it. To change the Device ID, hardware modifications are have to be done by adding, removing, or changing resistors. It is known that on the previous cards these resistors are located around the EEPROM chip which holds the firmware. Usually, there are alternative positions for the resistors on the board, but seeing as the resistors are so small (0402 package), it isn’t very wise to randomly remove and add some resistors without a plan.

The GTX 780 has device ID 0x1004, and the goal is to turn it into a Tesla K20/K20c which have device ID 0x1020 and 0x1022 respectively. The device ID of this card is also partially soft strapped as the previous generation cards. Thus the first five bits of the device ID can be set in the firmware, meaning that the device ID could be set to values between 0x1000 and 0x101F without modifying any hardware.

The board around the EEPROM

The board around the EEPROM. You can see the first attempt so adjust the value of a replaced resistor on the left.

The first step was to analyze the hardware around the EEPROM, and in particular, the alternate positions for the resistors. By simply using a multimeter, the connections between the EEPROM and the resistors can be easily determined, resulting in the following schematic.

Schematic of the board around the EEPROM.

Schematic of the board around the EEPROM.

The 25K, 30K and 45K resistors have alternate positions on the board, so they are the candidate resistors for modification. The global labels TP1, TP2, and TP3 correspond to the three test points on the lower left of the EEPROM. The lanes that these test points are connected to go directly into the GPU. Hooking up an oscilloscope to the EEPROM reveals that it is not used immediately when the computer is booting up. It is until after the first boot screen (when the type of processor and the amount of memory are shown) that the EEPROM is used. This leads to the theory that these resistors determine the device ID (hard strapped) and the EEPROM can override this partially (the first five least significant bits).

But before modifying the hardware, I wanted to see what I could change when the other bits of the soft straps in the firmware were modified. During this process I managed to screw up the firmware and the card was not detected anymore by my computer. I tried to ‘remove’ the EEPROM from the circuit (by desoldering the Vcc pin), but this didn’t help at all, so I dusted off my Raspberry Pi and hooked up the EEPROM to it. Since the EEPROM communicates over SPI, I could easily hook up the EEPROM by connecting six lines (Vcc, GND, MOSI, MISO, CS, and SCLK) and whip up some Python scripts that use spidev to communicate with the EEPROM. Special care should be taken when using spidev to communicate with EEPROM chips since there are chips (for example the Pm25LV512) that read/write commands and data at the falling edge of the clock, whereas spidev expects it at the rising edge of the clock. Luckily the Gigadevice GD25Q20B reads and writes at the rising edge of the clock so with just a few lines of Python, I could dump the contents of the EEPROM and write a new firmware to it.

While I was at it, connecting the Raspberry Pi to the EEPROM chip, I decided to use my trusty Saleae logic analyzer to sniff the SPI bus when the computer boots up. I noticed that the computer would not boot past the first screen. Aha Sherlock, a clue! Turns out that disconnecting the power of the logic analyzer makes the computer boot up just fine, and the device ID is…

0x1024

Now this is getting interesting! Nvflash now detects the GPU on the card as an Atlas C-series which is the next in line, unreleased Tesla K40 card. I hooked up my multimeter to my logic analyzer and measured the resistance between the pins. By process of elimination I found which resistor should be changed (or in my case, added in parallel)  that a ~30K Ohm resistor between SCLK and Vcc changed the device ID to 0x1024. Of course this isn’t the 0x1020 or 0x1022 that is aimed for, but it is close enough. And combined with the fact that the device ID is partially soft strapped, it is just a matter of changing the soft straps to ignore the resistor value of the fourth digit and set it to’0′ in software. After flashing the EEPROM and rebooting the computer nvflash revealed:

The card is now recognized as a Tesla K20.

The card is now recognized as a Tesla K20X.

By using the same process, it is also possible to change the device ID to 0x1022.

Did anyone say Tesla K20c?

Did anyone say Tesla K20c?

So this card is finally a K20, right? Well no. It turns out that CUDA programs hang whenever any particular CUDA function is called, and that it was also not possible to disable TCC mode as TCC mode is enabled by default on the Tesla K20. The card ran CUDA programs just fine before its metamorphosis so it’s probably not a hardware fault (excluding the added resistor).

Luckily, the good folks at TechPowerup have a list of VGA BIOSes and there is a Tesla K20c BIOS in there. After changing the soft straps in the Tesla BIOS and flashing it on the EEPROM, it turned out that this does the trick. This card is not a Tesla K20 that is faster and six times cheaper than a real Tesla K20. And there is no reason to believe this hack is not applicable to the GTX Titan (which has the exact same hardware specs as the Tesla K20X). The only problem in the case of the GTX780 that the nvidia-smi tool now shows that the card supposedly has 6GiB of RAM while in reality, it has 3GiB.

This is where ijsf (the guy who did the original GTX480 to Tesla hack) comes into play. Together we figured out that the amount of reported memory or the way it is determined has to be in the BIOS somewhere. Since NVIDIA does not disclose any information about their GPUs, the place to look for any kind of information related to NVIDIA GPUs is the nouveau project: an open source driver for NVIDIA GPUs. It turns out that the BIOS contains custom instructions that the GPU runs to initialize and configure itself. A list of opcodes can be found in the nouveau source code. By determining the opcodes in the BIOS and performing a diff against the opcodes in the GTX780 BIOS, the list of possible modifications can be significantly reduced. And when the correct arguments for the opcodes are found, the card reports the correct amount of memory.nvidia-smi

Tesla K20c 3GiBAlthough the card runs CUDA like a champ, there are still a few things that could be improved:

  • The GTX780 and Titan support PCIE 3.0 whereas the Tesla K20(X) supports 2.0. This can most likely be changed in the BIOS as well.
  • The Tesla K20(X) has two copy engines which means that it can do bidirectional memory transfers at the same time. Perhaps this is configured in the BIOS as well.
  • Full double precision floating point support like the Tesla K20(X) and GTX Titan.

So these are the next topics of research.

Comments are closed.