Kiarash
Kiarash

Reputation: 7998

Parallel processing on FPGA. How to start with?

I have a computational intensive task which I used CUDA to implement it and now I want to make it even faster with FPGAs (if possible)

The system I want to implement is a series of computations each similar to matrix multiplication in sense of being parallel. It also has some non-parallel parts in between. It works with big amounts of data.

Although I want it as fast as possible, I have enough time to learn and explore with FPGAs.

here I'm asking for suggestions on how I start my path? Which FPGA to choose and where to learn about it. any website or online class or books? I've decided to do this anyway but your idea of whether this will be faster on FPGA or not would be helpful too.

Upvotes: 3

Views: 8359

Answers (2)

Martin Thompson
Martin Thompson

Reputation: 16802

The big wins from an FPGA over using a GPU come from:

  • Using non-standard word widths optimised to your application. This allows denser logic, which allows more parallel processing blocks
  • using your knowledge of the required accesses to external RAM to schedule them in hardware more efficiently than a general purpose memory controller can.

The downside is getting data to and from the FPGA. Draw a data-transfer diagram before you start. Even if the FPGA provides infinite speedup, you might still find it's not worth the effort if there's loads of data to be shuffled to and fro!

It's likely you'll be wanting a PCI express based board. Which is (I imagine) a whole new learning-curve before you get to do anything with the FPGA - but if you're up for it, it'll be a very interesting task!

In terms of choosing FPGAs, have a play with the software tools from the various vendors - at the learning stage that's much more important than the chips themselves. You won't find (at this early learning-stage) a show-stopper feature in any of the various chips. Also take into account the availability of boards with your required interfaces on, and any IP-core you might need to do the high-speed interfacing (eg PCIe)

Upvotes: 3

Ben Voigt
Ben Voigt

Reputation: 283674

You can get a substantial speedup on most parallel problems with an FPGA.

However, in addition to implementing your computation on the FPGA, there's a lot of work involved in getting the data back and forth from the CPU/main memory. This will require implementation of (for example) a PCI Express endpoint in the FPGA logic (bus mastering for maximum speed) and custom drivers on the software side. Most operating systems will require those drivers to be developed in kernel mode.

And you can't just use the most straightforward approach for FPGA programming either. You're going to need to worry about pipelining and clock synchronization in order to maximize throughput.

In other words, it's a substantially difficult task even for engineers with years of FPGA experience. I strongly suggest you find someone to work with on this. Depending on how proprietary your project is, you might find skilled academics willing to work with you as long as you provide them with all materials and publication rights.

If you're determined to go it alone, you'll need some hardware. Many different companies offer FPGA wired up as accelerators, for example http://www.nallatech.com/pci-express-cards.html

Depending on whether you choose a Xilinx or Altera FPGA, you'll find considerable sample code and tutorials for getting PCI express working.

Upvotes: 1

Related Questions