FFT choice and implementations

Choice of the FFT code

The DDSCAT code is invoking 3 dimensional Fast Fourier Transform during each iteration of the solution procedure. One change in going from DDSCAT.4b to 4c was modification of the code to permit use of the GPFA FFT algorithm developed by Dr. Clive Temperton (1992) [2]. In going from DDSCAT 5a10 to 6.0 we introduced a new FFT option: the FFTW package of Frigo and Johnson (http://www.fftw.org/). In version DDSCAT 7.x we continue to offer Temperton’s GPFA code and we wrote interface to 3.2 release of the FFTW. FFTW is a C subroutine library for computing the discrete Fourier transform and is available from http://www.fftw.org/. We tested FFTW3.2 within DDSCAT on Windows under both Cygwin and/or MinGW/MSYS as well as on LINUX computers.

For target dimensions which are factorizable as $2^i 3^j 5^k$ (for integer i, j, k), the GPFA and FFTW codes have the same memory requirements. For targets with extents $N_x, N_y, N_z$ which are not factorizable as $2^i 3^j 5^k$, the GPFA code needs to extend the computational volume to have values of $N_x, N_y$, and$N_z$ which are factorizable by 2, 3, and 5. For these cases, GPFA requires more memory than FFTW.

The DDSCAT user conscious about speed performance of the code could also examine the FFT implementation available for the computer/compiler. Such code may be faster from what we are offering. For that purpose the TSTFFT.f code can be modified and tested.

FFTW instalation

The source code of FFTW, one of the FFT options in DDSCAT, is not prowided by us. Instead it has to be installed separately by the user. The code is availale for download from the FFTW home page http://www.fftw.org/. The FFTW code is written in C but once the code is compiled and libraries are created it is straightforward to call FFTW from FORTRAN. Standard distribution of FFTW compiles to double precision version. Many computer centers, and several compiler vendors provide FFTW precompiled. The name of the FFTW library may be libfftw3.a but there may be sevaral variants - single or double precision, MIPS version, etc. The FFTW library has to be linked with the DDSCAT. One typical syntax may be

g95 -O3 -ffast_math TSTFFT.f cxfft3n.f gpfa.f … -L/usr/local/lib -lfftw3

which assumes that FFTW is located in /usr/local/lib/libfftw3.a. The interface code which we provide cxfftw32.f incudes one more file fftw.f which provides definitions of various parameter variables used by FFTW. This file is typically located in /usr/local/include.

To compile single precision version of FFTW

configure —enable-single
make
make install

This will install /usr/local/lib/libfftw3f.a.

Some slight warning about the Windows environment. One can compile FFTW under UNIX emulation Cygwin. In such case the DDSCAT code will have to be compiled and linked also under Cygwin. However, another option under Windows is to use MinGW/MSYS which produces native Windows code.

Testing FFT: The TSTFFT.f code

The DDSCAT 7.x test code is written in FORTRAN90. The test code has simple CPP preprocessor statements which tailors the main program to test a particular FFT implementation. In addition the executable reads several parameters from the command line. Because time the FFT is executed limits overall execution time of the DDSCAT the user may want to examine the speed of FFT for a particular geometry. Often, close choices of target discretization, produce drastically slower/faster numerical performance.

This test code is called TSTFFT.f and can be compiled and run to test speed of FFT. The Makefile for this code has to be modified to indicate which version of the FFT (e.g. GPFA, FFTW) is being tested. In addition one has to define location of the FFTW libraries (for example, /usr/local/lib/libfftw3.a). The test program should be invoked with dimensions of the problem to be tested. For example, command

bench 10 12 86

will test speed of setting the problem, speed of one complex, forward FFT of the size 10x12x86, and write single line of output with additional information of Mflops.

Current version of the test source code is available here tstfft.zip.

Notes

  • Current interface to FFTW3.2, the cxfftw32.f subroutine, is written in double precision.
  • The GPFA interface - the cxfft3n.f subroutine is in single precision.
Bibliography
1. Temperton, C.J. 1983, Self-sorting mixed-radix Fast Fourier Transforms, J. Comp. Phys., 52, 1-23.
2. Temperton, C. 1992, A Generalized Prime Factor FFT Algorithm for Any N = $2^p 3^q 5^r$, SIAM Journal of Scientific and Statistical Computing, 13, 676-686.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License