JavaScript Image Preloader

發表： 2012 年 02 月 12 日 | 作者： vinsonhsieh | Filed under: 演算法 | 發表留言

http://www.webreference.com/programming/javascript/gr/column3/index.html

JavaImagePrloader

One of JavaScript’s greatest strengths is the generation of HTML code on the fly, enabling all kinds of effects not possible with HTML alone. One of the hurdles to overcome when generating HTML is to ensure that any images referenced using <IMG> tags are properly loaded. This has been known to cause problems in some circumstances, especially when many <IMG> tags are added at once (even when many of them are referencing the same image). If not done properly (depending on the browser), the images might not be the correct size or load properly or even load at all.

There are a number of mechanisms used to generate HTML with JavaScript:

During the document load/parse phase, HTML may be written as text using document.write().
After the document has loaded, HTML may be added using DOM functions such as document.createElement() and Node.appendChild().
After the document has loaded, HTML may be inserted textually using the innerHTML property of HTML elements.

Images referenced using the first method will generally load without any difficulty as the browsers are designed to optimize the downloading of images during the document load/parse phase. All <IMG> tags that reference the same image will be grouped together so that the image is only downloaded once.

This is not true however for the other methods. While most images will load, some browsers (notably Internet Explorer) will render the images incorrectly or sometimes not at all. When <IMG> tags reference the same image they are not optimized, instead the image is downloaded for each <IMG> tag separately. This behavior gets worse as the bandwidth gets narrower. For some reason (unknown to the author), image caching doesn’t seem to affect this.

So why would anyone want to use these methods at all? There are many potential reasons:

1. Maintenance on JavaScript that has already been written this way.

2. HTML that must be added dynamically after the completion of the document load.

When preloading an image, a common mistake is to simply create an Image object and assign its “src” attribute without waiting for the image to load:

var myImage = new Image;

myImage.src = “images/myImage.png”;

This works if the image loads quickly enough. However if the image is used before it has loaded, this preload step will have no effect at all.

To make preloading work properly, the code must wait for the image to complete loading before it is used. In this article, I’ve created an ImagePreloader class that will preload an array of images and call a call-back function when all the images have been loaded into the browser.

The constructor for the ImagePreloader takes an array of image URLs and a call-back function as arguments.

function ImagePreloader(images, call-back)

{

Â Â // store the call-back

Â Â this.call-back = call-back;

Â Â // initialize internal state.

Â Â this.nLoaded = 0;

Â Â this.nProcessed = 0;

Â Â this.aImages = new Array;

Â Â // record the number of images.

Â Â this.nImages = images.length;

Â Â // for each image, call preload()

Â Â for ( var i = 0; i < images.length; i++ )

Â Â Â Â Â this.preload(images[i]);

}

The call-back function is stored for later use, then each image URL is passed into the preload() method.

ImagePreloader.prototype.preload = function(image)

{

Â Â // create new Image object and add to array

Â Â var oImage = new Image;

Â Â this.aImages.push(oImage);

Â Â

Â Â // set up event handlers for the Image object

Â Â oImage.onload = ImagePreloader.prototype.onload;

Â Â oImage.onerror = ImagePreloader.prototype.onerror;

Â Â oImage.onabort = ImagePreloader.prototype.onabort;

Â Â

Â Â // assign pointer back to this.

Â Â oImage.oImagePreloader = this;

Â Â oImage.bLoaded = false;

Â Â

Â Â // assign the .src property of the Image object

Â Â oImage.src = image;

}

The preload function creates an Image object and assigns functions for the three Image events; onload, onerror and onabort. The onload event is raised when the image has been loaded into memory, the onerror event is raised when an error occurs while loading the image and the onabort event is raised if the user cancels the load by clicking the Stop button on the browser.

A pointer to the ImagePreloader object is stored in each Image object to facilitate the call-back mechanism. An optional boolean flag can be added here to indicate whether the image loads properly or not.

Finally, the “src” attribute is assigned to start the loading of the image.

ImagePreloader.prototype.onComplete = function()

{

Â Â this.nProcessed++;

Â Â if ( this.nProcessed == this.nImages )

Â Â {

Â Â Â Â Â this.call-back(this.aImages, this.nLoaded);

Â Â }

}

ImagePreloader.prototype.onload = function()

{

Â Â this.bLoaded = true;

Â Â this.oImagePreloader.nLoaded++;

Â Â this.oImagePreloader.onComplete();

}

ImagePreloader.prototype.onerror = function()

{

Â Â this.bError = true;

Â Â this.oImagePreloader.onComplete();

}

ImagePreloader.prototype.onabort = function()

{

Â Â this.bAbort = true;

Â Â this.oImagePreloader.onComplete();

}

遞迴函數

發表： 2012 年 02 月 12 日 | 作者： vinsonhsieh | Filed under: 演算法 | 發表留言

http://140.127.40.1/~jwu/c/cpg.htm

http://140.127.40.1/~jwu/c/cpgch6.htm

Python Patterns – An Optimization Anecdote

發表： 2012 年 01 月 30 日 | 作者： vinsonhsieh | Filed under: 演算法 | 發表留言

http://www.python.org/doc/essays/list2str.html

Python Patterns – An Optimization Anecdote

The other day, a friend asked me a seemingly simple question: what’s the best way to convert a list of integers into a string, presuming that the integers are ASCII values. For instance, the list [97, 98, 99] should be converted to the string ‘abc’. Let’s assume we want to write a function to do this.

The first version I came up with was totally straightforward:

    def f1(list):
        string = ""
        for item in list:
            string = string + chr(item)
        return string

That can’t be the fastest way to do it, said my friend. How about this one:

    def f2(list):
        return reduce(lambda string, item: string + chr(item), list, "")

This version performs exactly the same set of string operations as the first one, but gets rid of the for loop overhead in favor of the faster, implied loop of the reduce() function.

Sure, I replied, but it does so at the cost of a function call (the lambda function) per list item. I betcha it’s slower, since function call overhead in Python is bigger than for loop overhead.

(OK, so I had already done the comparisons. f2() took 60% more time than f1(). So there 🙂

Hmm, said my friend. I need this to be faster. OK, I said, how about this version:

    def f3(list):
        string = ""
        for character in map(chr, list):
            string = string + character
        return string

To both our surprise, f3() clocked twiceas fast as f1()! The reason that this surprised us was twofold: first, it uses more storage (the result of map(chr, list) is another list of the same length); second, it contains two loops instead of one (the one implied by the map() function, and the for loop).

Of course, space versus time is a well-known trade-off, so the first one shouldn’t have surprised us. However, how come two loops are faster than one? Two reasons.

First, in f1(), the built-in function chr() is looked up on every iteration, while in f3() it is only looked up once (as the argument to map()). This look-up is relatively expensive, I told my friend, since Python’s dynamic scope rules mean that it is first looked up (unsuccessfully) in the current module’s global dictionary, and then in the dictionary of built-in function (where it is found). Worse, unsuccessful dictionary lookups are (on average) a bit slower than successful ones, because of the way the hash chaining works.

The second reason why f3() is faster than f1() is that the call to chr(item), as executed by the bytecode interpreter, is probably a bit slower than when executed by the map() function – the bytecode interpreter must execute three bytecode instructions for each call (load ‘chr’, load ‘item’, call), while the map() function does it all in C.

This led us to consider a compromise, which wouldn’t waste extra space, but which would speed up the lookup for the chr() function:

    def f4(list):
        string = ""
        lchr = chr
        for item in list:
            string = string + lchr(item)
        return string

As expected, f4() was slower than f3(), but only by 25%; it was about 40% faster than f1() still. This is because local variable lookups are muchfaster than global or built-in variable lookups: the Python “compiler" optimizes most function bodies so that for local variables, no dictionary lookup is necessary, but a simple array indexing operation is sufficient. The relative speed of f4() compared to f1() and f3() suggests that both reasons why f3() is faster contribute, but that the first reason (fewer lookups) is a bit more important. (To get more precise data on this, we would have to instrument the interpreter.)

Still, our best version, f3(), was only twice as fast as the most straightforward version, f1(). Could we do better?

I was worried that the quadratic behavior of the algorithm was killing us. So far, we had been using a list of 256 integers as test data, since that was what my friend needed the function for. But what if it were applied to a list of two thousand characters? We’d be concatenating longer and longer strings, one character at a time. It is easy to see that, apart from overhead, to create a list of length N in this way, there are 1 + 2 + 3 + … + (N-1) characters to be copied in total, or N*(N-1)/2, or 0.5*N**2 – 0.5*N. In addition to this, there are N string allocation operations, but for sufficiently large N, the term containing N**2 will take over. Indeed, for a list that’s 8 times as long (2048 items), these functions all take much more than 8 times as long; close to 16 times as long, in fact. I didn’t dare try a list of 64 times as long.

There’s a general technique to avoid quadratic behavior in algorithms like this. I coded it as follows for strings of exactly 256 items:

    def f5(list):
        string = ""
        for i in range(0, 256, 16): # 0, 16, 32, 48, 64, ...
            s = ""
            for character in map(chr, list[i:i+16]):
                s = s + character
            string = string + s
        return string

Unfortunately, for a list of 256 items, this version ran a bit slower (though within 20%) of f3(). Since writing a general version would only slow it down more, we didn’t bother to pursue this path any further (except that we also compared it with a variant that didn’t use map(), which of course was slower again).

Finally, I tried a radically different approach: use only implied loops. Notice that the whole operation can be described as follows: apply chr() to each list item; then concatenate the resulting characters. We were already using an implied loop for the first part: map(). Fortunately, there are some string concatenation functions in the string module that are implemented in C. In particular, string.joinfields(list_of_strings, delimiter) concatenates a list of strings, placing a delimiter of choice between each two strings. Nothing stops us from concatenating a list of characters (which are just strings of length one in Python), using the empty string as delimiter. Lo and behold:

    import string
    def f6(list):
        return string.joinfields(map(chr, list), "")

This function ran four to five times as fast as our fastest contender, f3(). Moreover, it doesn’t have the quadratic behavior of the other versions.

And The Winner Is…

The next day, I remembered an odd corner of Python: the array module. This happens to have an operation to create an array of 1-byte wide integers from a list of Python integers, and every array can be written to a file or converted to a string as a binary data structure. Here’s our function implemented using these operations:

    import array
    def f7(list):
        return array.array('B', list).tostring()

This is about three times as fast as f6(), or 12 to 15 times as fast as f3()! it also uses less intermediate storage – it only allocates 2 objects of N bytes (plus fixed overhead), while f6() begins by allocating a list of N items, which usually costs 4N bytes (8N bytes on a 64-bit machine) – assuming the character objects are shared with similar objects elsewhere in the program (like small integers, Python caches strings of length one in most cases).

Stop, said my friend, before you get into negative times – this is fast enough for my program. I agreed, though I had wanted to try one more approach: write the whole function in C. This could have minimal storage requirements (it would allocate a string of length N right away) and save a few instructions in the C code that I knew were there in the array module, because of its genericity (it supports integer widths of 1, 2, and 4 bytes). However, it wouldn’t be able to avoid having to extract the items from the list one at a time, and to extract the C integer from them, both of which are fairly costly operations in the Python-C API, so I expected at most modest speed up compared to f7(). Given the effort of writing and testing an extension (compared to whipping up those Python one-liners), as well as the dependency on a non-standard Python extension, I decided not to pursue this option…

Conclusion

If you feel the need for speed, go for built-in functions – you can’t beat a loop written in C. Check the library manual for a built-in function that does what you want. If there isn’t one, here are some guidelines for loop optimization:

Rule number one: only optimize when there is a proven speed bottleneck. Only optimize the innermost loop. (This rule is independent of Python, but it doesn’t hurt repeating it, since it can save a lot of work. 🙂
Small is beautiful. Given Python’s hefty charges for bytecode instructions and variable look-up, it rarely pays off to add extra tests to save a little bit of work.
Use intrinsic operations. An implied loop in map() is faster than an explicit for loop; a while loop with an explicit loop counter is even slower.
Avoid calling functions written in Python in your inner loop. This includes lambdas. In-lining the inner loop can save a lot of time.
Local variables are faster than globals; if you use a global constant in a loop, copy it to a local variable before the loop. And in Python, function names (global or built-in) are also global constants!
Try to use map(), filter() or reduce() to replace an explicit for loop, but only if you can use a built-in function: map with a built-in function beats for loop, but a for loop with in-line code beats map with a lambda function!
Check your algorithms for quadratic behavior. But notice that a more complex algorithm only pays off for large N – for small N, the complexity doesn’t pay off. In our case, 256 turned out to be small enough that the simpler version was still a tad faster. Your mileage may vary – this is worth investigating.
And last but not least: collect data. Python’s excellent profile module can quickly show the bottleneck in your code. if you’re considering different versions of an algorithm, test it in a tight loop using the time.clock() function.

By the way, here’s the timing function that I used. it calls a function f n*10 times with argument a, and prints the function name followed by the time it took, rounded to milliseconds. The 10 repeated calls are done to minimize the loop overhead of the timing function itself. You could go even further and make 100 calls… Also note that the expression range(n) is calculated outside the timing brackets – another trick to minimize the overhead caused by the timing function. If you are worried about this overhead, you can calibrate it by calling the timing function with a do-nothing function.

    import time
    def timing(f, n, a):
        print f.__name__,
        r = range(n)
        t1 = time.clock()
        for i in r:
            f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a)
        t2 = time.clock()
        print round(t2-t1, 3)

Epilogue

A few days later, my friend was back with the question: how do you do the reverse operation? I.e. create a list of integer ASCII values from a string. Oh no, here we go again, it flashed through my mind…

But this time, it was relatively painless. There are two candidates, the obvious:

    def g1(string):
        return map(ord, string)

and the somewhat less obvious:

    import array
    def g2(string):
        return array.array('b', string).tolist()

Timing these reveals that g2() is about five times as fast as g1(). There’s a catch though: g2() returns integers in the range -128..127, while g1() returns integers in the range 0..255. If you need the positive integers, g1() is going to be faster than anything postprocessing you could do on the result from g2(). (Note: since this essay was written, the ‘B’ typecode was added to the array module, which stores unsigned bytes, so there’s no reason to prefer g1() any more.)

Sample Code

3:2 pulldown

發表： 2012 年 01 月 30 日 | 作者： vinsonhsieh | Filed under: 演算法 | 發表留言

http://www.doom9.org/index.html?/synch.htm

1. 24 FRAMES becomes 48 FIELDS

Atop

Abottom

Btop

Bbottom

Ctop

Cbottom

Dtop

Dbottom

Frame A becomes 2 fields: Atopfield +Abottomfield. Thus, 4 Frames becomes 8 Fields, and 24 Frames becomes 48 Fields. This “field-based" material is then TELECINED into an NTSC Video signal. As TELECINE is a STANDARDIZED conversion, we have to follow the rules of engagement ;). The rule is to do a REPEAT_FIRST_FIELD in a 2:3 sequence.

2. 4 FRAMES (8 FIELDS) becomes 5 FRAMES (10 FIELDS)

Atop

Abottom

Atop

Bbottom

Btop

Cbottom

Ctop

Cbottom

Dtop

Dbottom

If we look closely, we can see a sequence of At Al At followed by Bl Bt then Cl Ct Cl then Dt Dl. But, since 1 FRAME consists of 2 FIELDS, then the sequence becomes AA AB BC CC DD. What we have now is a conversion from 4 SOLID frame into 5 FRAMES consisting of 3 SOLID FRAMES and 2 INTERLACED FRAMES. By INTERLACED I am referring to a FRAME that’s made-up from 2 FIELDs of DIFFERENT FRAME source. The AB frame is the example.

So, 4 FRAMES becomes 5 FRAMES, thus 24 becomes….. 30, DONE! Done? Nope, not by a longshot. The NTSC Video is 29.97fps, so PLAYBACK of 30fps must be slow-down into 29.97fps, which brings us to the term DROP_FRAME.

Don’t get a wrong concept of DROP_FRAME as “FRAMES being REMOVED or DROPPED". In a 30fps Video sequence, a DROP_FRAME time code counts video frames accurately in relationship to real time. DROP_FRAME time code counts each video frame, but, when that .03 finally adds up to a video frame, it skips (or drops) a number. It does not drop a film or video frame, it merely skips a number and continues counting. This allows it to keep accurate time. So if you’re cutting a scene using drop frame time code, and the duration reads as, say, 30 minutes and 0 frames, then you can be assured the duration is really 30 minutes. Confusing? Well, to put it in simple term, DROP_FRAME here is in essence EQUAL a SLOWED_DOWN playback from a pure 30fps into the correct NTSC 29.97fps SPEED. In an MPEG-2 domain, this means that the 00 and 01 frames are dropped or SKIPPED from time code, at the start of each minute except minutes which are even multiples of 10.

NOW, it is DONE.

內積的一種意義

發表： 2012 年 01 月 29 日 | 作者： vinsonhsieh | Filed under: 演算法 | 發表留言

int speed = 5;
int Theta = 30;
BulletX[0] = BulletX[1] = BulletX[2] =100;
BulletY[0] = BulletY[1] = BulletY[2] =500;
BulletVx[0] = speed * cos(M_PI/180*Theta );
BulletVy[0] = speed * sin(M_PI/180*Theta );

[Bullet[0]順時鐘旋轉30度]

BulletVx[1] = BulletVx[0]*cos(M_PI/180*30) – BulletVy[0]*sin(M_PI/180*30);
BulletVy[1] = BulletVx[0]*sin(M_PI/180*30) + BulletVy[0]*cos(M_PI/180*30);

[Bullet[0]逆時鐘旋轉30度]

BulletVx[2] = BulletVx[0]*cos(M_PI/180*30) + BulletVy[0]*sin(M_PI/180*30);
BulletVy[2] = -BulletVx[0]*sin(M_PI/180*30) + BulletVy[0]*cos(M_PI/180*30);

[敵機跟原始子彈出發的向量]

int PX, PY;

PX = e.left – BulletX[0];
PY = e.top – BulletY[0];

int V1P = BulletVx[1]*PX + BulletVy[1]*PY; //很大的負值
int V2P = BulletVx[2]*PX + BulletVy[2]*PY; //很小的正值

假設我們要發出一顆子彈打敵機，而且只能是Bullet[0]原方向左右30度選一個，那麼到底要是左彎還是右彎打出新子彈?

V1P是一個負值表示Bullet[1]跟敵機夾角介於90到180度之間

V2P是一個正值表示Bullet[2]跟敵機夾角介於0到90度之間

所以這個答案很明顯是選Bullet[2]來描繪子彈

但萬一兩個都是正值那改選誰呢?

其實也很簡單，就看誰的值大，大的那個表示投影量大也表示比較接近敵機位置.

Fastest way to uniqify a list in Python

發表： 2011 年 12 月 14 日 | 作者： vinsonhsieh | Filed under: 演算法 | 發表留言

test – Source Code Download

http://www.peterbe.com/plog/uniqifiers-benchmark

“Order – f8 最快"

“None order – f7 最快."

from sets import Set

def f1(seq): # Raymond Hettinger # not order preserving
set = {}
map(set.__setitem__, seq, [])

return set.keys()

def f2(seq): # ********* # order preserving
checked = []

for e in seq:
if e not in checked:
checked.append(e)

return checked

def f3(seq): # Not order preserving
keys = {}
for e in seq:
keys[e] = 1

return keys.keys()

def f4(seq): # ********** order preserving
noDupes = []
[noDupes.append(i) for i in seq if not noDupes.count(i)]
return noDupes

def f5(seq, idfun=None): # Alex Martelli ******* order preserving
if idfun is None:
def idfun(x): return x
seen = {}
result = []
for item in seq:
marker = idfun(item)
# in old Python versions:
# if seen.has_key(marker)
# but in new ones:
if marker in seen: continue
seen[marker] = 1
result.append(item)

return result

def f5b(seq, idfun=None): # Alex Martelli ******* order preserving
if idfun is None:
def idfun(x): return x
seen = {}
result = []
for item in seq:
marker = idfun(item)
# in old Python versions:
# if seen.has_key(marker)
# but in new ones:
if marker not in seen:
seen[marker] = 1
result.append(item)

return result

def f6(seq): # Not order preserving
return list(Set(seq))

def f7(seq): # Not order preserving
return list(set(seq))

def f8(seq): # Dave Kirby # Order preserving
seen = set()
return [x for x in seq if x not in seen and not seen.add(x)]

def f9(seq): # Not order preserving
return {}.fromkeys(seq).keys()

def f10(seq, idfun=None): # Andrew Dalke # Order preserving
return list(_f10(seq, idfun))

def _f10(seq, idfun=None):
seen = set()
if idfun is None:
for x in seq:
if x in seen:
continue
seen.add(x)
yield x
else:
for x in seq:
x = idfun(x)
if x in seen:
continue
seen.add(x)
yield x

def f11(seq): # f10 but simpler # Order preserving
return list(_f10(seq))

def _f11(seq):
seen = set()
for x in seq:
if x in seen:
continue
seen.add(x)
yield x

最小成本生成樹 ( minimum cost spanning tree )

發表： 2011 年 09 月 02 日 | 作者： vinsonhsieh | Filed under: 演算法 | 發表留言

http://210.240.194.97/course/2007/Algorithms/poko/prim.htm

最小成本生成樹 ( minimum cost spanning tree )

在圖形中若於個邊 (edge) 上加上一些值，此數值稱為比重 ( weight ) 。而此圖形稱為比重圖形 (Weight Graph ) ，若 weight 是成本 ( cost ) 或距離( distance ) ，則稱此圖形為網路 ( Network )。根據Spanning Tree 的定義，知一個圖形可有許多不同 spanning tree ，在 network 中找出一個具有最小成本 ( Cost )的Spanning tree ，則此 Spanning tree 稱為最小成本生成樹。

Prims 演算法

有一個網路， G = (V,E) ，其中 V={0,1,2,3,…,n}，最初設定 U={0}，U,V 是個頂點集，然後從 U-V 集合中找一個頂點V，能與U集合中的某頂點形成最小成本的邊，把這一頂點V加入U集合，繼續此步驟，直到U集合等於V集合為止。

1.首先選擇0

2.選擇頂點5 因其最接近0

3.選擇頂點4 因其最接近{0，5}

4.選擇頂點3 因其最接近{0，5，4}

5.選擇頂點2 因其最接近(0，5，4，3)

6.選擇頂點1 因其最接近(0，5，4，3，2)

7.選擇頂點6 因其最接近(0，5，4，3，2，1)

傅立葉轉換演算法專論

發表： 2011 年 07 月 18 日 | 作者： vinsonhsieh | Filed under: 演算法 | 發表留言

傅立葉轉換演算法專論 – for dragon 2004/11/9 下午 09:52:27

http://www.programmer-club.com.tw/showSameTitleN/general/3041.html

第一節　傅立葉轉換的定義

什麼是傅立葉轉換（Fourier Transform）？在早期的定義中，傅立葉序列（Fourier Series）被看成是一堆波的合成波，其公式約略如下（假設週期為T）：

∞
F(u) = Σ　 (a(k)*cos(ku*2π/T) + b(k)*sin(ku*2π/T))
k=-∞

由於傅立葉序列許多的合成波高低頻特性，在訊號處理（Signal Processing）上有很多應用價值，因此傅立葉序列所在的領域一般又稱為頻譜領域（Frequency Domain），而傅立葉轉換前的領域便稱為時序領域（Time Domain）。然而有很多函數並不屬於週期函數，但我們仍然可以將之寫成類似上面的公式。考慮任何一個函數f(x)，令：

1　 ∞
A(u) = ─ ∫　 f(x) cos(ux) dx
π　-∞

1　 ∞
B(u) = ─ ∫　 f(x) sin(ux) dx
π　-∞

如此便可將f(x)改寫下面類似傅立葉序列的公式，稱為傅立葉積分（Fourier Integral）：

∞
f(x) = ∫　 (A(u)*cos(ux) + B(u)*sin(ux)) du
-∞

再將之化解成複數形態（Complex Form），可得：

1　　 ∞　　 ∞　　　 iuy　　　 -ixu
f(x) = ── ∫　〔 ∫　 f(y) e　 dy 〕 e　　 du
2π　 -∞　　-∞

然後即可定義出傅立葉轉換公式及反轉公式如下：

∞　　　iux
F(u) = ∫　f(x) e　 dx
-∞

1　　 ∞　　　-ixu
f(x) = ── ∫　F(u) e　　du
2π　 -∞

有些書則將傅立葉轉換公式定義為：

1　　　　∞　　　iux
F(u) = ───── ∫　f(x) e　　dx
sqrt(2π)　 -∞

1　　　　∞　　　-ixu
f(x) = ───── ∫　F(u) e　　du
sqrt(2π)　 -∞

或

∞　　　i2πux
F(u) = ∫　f(x) e　　dx
-∞

∞　　　-i2πxu
f(x) = ∫　F(u) e　　du
-∞

不過在本文中，將都以第一式為準。必須注意的是，上面的i=sqrt(-1)，表示
複數（Complex Number）值，而非變數。而式中e^(iux)經尤拉（Euler）公式
化解可得：

iux
e　　= cos(ux) + i sin(ux)

當然，這類數學式子的化解較為繁複，因此本文中並不將之全數列出。有興趣的人不妨自行導導看，或是參考其他書籍了解。

由於電腦內的資料都是離散（Discrete）的形式，因此上面這種連續性積分的公式，並不太適用於電腦資料的運算上，於是便有離散式傅立葉轉換（簡稱DFT）公式的需要。至於如何導出這個公式呢？假設給定一個長度為N的向量A，欲轉換成傅立葉頻譜中的向量B，則我們可視向量A的週期為N，且希望向量A可寫成下列式子：

1　N-1　　　 -ikm*2π/N
A(k) = ─ Σ　B(m) e　　　　　，k = 0～N-1
N　m=0

可算出：

N-1　　　 imk*2π/N
B(m) = Σ　A(k) e　　　　　　　，m = 0～N-1
k=0

這便是離散式傅立葉轉換的公式，而前一個式子便是其反轉公式。但是式子中的e^(i2π/N)看起來和寫起來均很不方便，因此我們定義

i (2π/N)
ω = e

則前述轉換公式可寫成：

N-1　　　　mk
B(m) = Σ　A(k) ω　　　　　，m = 0～N-1
k=0

1　N-1　　　　-km
A(k) = ─ Σ　B(m) ω　　　，k = 0～N-1
N　m=0

這個ω便是乘法單位元素的主要N次方根（Principle Nth Root），它具有下列特性：

2004/11/9 下午 09:55:44

　　　0　　　 1　　　 2　　　　　　N-1　　　 N
ω ＝1, ω ≠1, ω ≠1, …, ω　 ≠1, ω ＝1

1　　2　　　　N-1
1 + ω + ω + … ω　　= 0

N/2
ω　＝ -1　　，當 N = 2k 時

這些特性對於後面要說明的快速傅立葉轉換演算法（簡稱FFT）是很重要的，請各位要記下來。其實這樣的式子可以更進一步擴展至通用的Commutative Ring(R,+,*,0,1)數學領域上，但為方便說明，以下我們還是只局限在複數領域上。所謂Commutative Ring，其定義為：

1.具有分配律 (a+b)+c = a+(b+c)、 (a*b)*c = a*(b*c)
2.具有結合律 (a+b)*c = a*c + b*c、a*(b+c) = a*b + a*c
3.0為加法單位元素 a + 0 = 0 + a = a
4.1為乘法單位元素 a * 1 = 1 * a = a
5.加法可交換 a + b = b + a
6.具有加法反元素 a + b = 0
7.乘法可交換 a * b = b * a

若去除第7項特性，則稱為Ring；若加上具有乘法反元素特性，則稱為Field。通常傅立葉轉換後的函數，可表示成下式：

F(u) = ｜F(u)｜ e^iφ(u)

｜F(u)｜為傅立葉頻譜大小（Fourier Spectrum），φ(u)則為相位角度（Phase Angle），而｜F(u)｜^2則通常稱為次方頻譜（Power Spectrum）或頻譜密度（Spectral Density）。其定義如下：

2　　　　　　 2
｜F(u)｜= sqrt(F(u).Real　+ F(u).Image )

-1
φ(u) = tan　 (F(u).Image/F(u).Real)

上式之Real、Image分別表示該數值的實數及虛數部份。這些定義，在傅立葉頻譜的顯示上是相當常用的。而對於二維影像而言，相位角度包含了大部份影像的邊緣（Edge）資訊，這點對於影像處理上是很有用的。

至於傅立葉轉換有那些特性呢？下面我們便列出幾個一維傅立葉轉換的特性。

1.線性（Linearity）－對於任兩個時序領域向量A(x)、B(x)，則

FT(c1*A(x)+c2*B(x)) = c1*FT(A(x)) + c2*FT(B(x))

其中FT表示傅立葉轉換，c1、c2為任何兩個複數值。

2.平移（Shifting）與調變（Modulation）－對於任一個時序領域向量A(x)，傅立葉轉換後的向量為B(x)，則

FT(A(x-k)) = B(x) * ω^xk
FT(A(x)*ω^-xk) = B(x-k)

3.共輒對稱性（Conjudate Symmetry）－對於任一個時序領域實數向量A(x)，若其傅立葉轉換後的向量為B(y)，則B(y)與B(N-y)互為共軛數，其中N為向量長度。由於這點在傅立葉轉換計算中亦很有用，因此我們特別再加以說明。假設有兩個實數向量f(n)、g(n)，經傅立葉轉換後的向量為F(n)、G(n)，若我們將f、g組合成一複數向量x(n)=f(n)+ig(n)，經傅立葉轉換後的向量為X(n)，則依線性化特性，可得

X(n) = F(n) + i G(n)

也就是說

X(n).Real　= F(n).Real　- G(n).Image
X(n).Image = F(n).Image + G(n).Real

因f(n)、g(n)均為實數，故F(n)、F(N-n)及G(n)、G(N-n)均互為補數，可得：

X(N-n).Real　=　F(n).Real　+ G(n).Image
X(N-n).Image = -F(n).Image + G(n).Real

解這兩組式子便可得到

F(n).Real　= ( X(n).Real　+ X(N-n).Real ) / 2
F(n).Image = ( X(n).Image – X(N-n).Image) / 2
G(n).Real　= ( X(n).Image + X(N-n).Image) / 2
G(n).Image = (-X(n).Real　+ X(N-n).Real ) / 2

由這個結果可知，一次傅立葉轉換計算，其實是可以算出兩個實數向量的傅立葉轉換值的（記得F、G因有共輒數關係，上式真正的計算只需一半）。

2004/11/9 下午 09:57:13

4.乘積（Product）與迴旋（Convolution）－對於任兩個時序領域向量a(x)、b(x)，則

FT(a(x)‧b(x)) = FT(a(x)) × FT(b(x)) / N
FT(a(x)×b(x)) = FT(a(x)) ‧ FT(b(x))

其中‧表向量乘積、×表向量迴旋。

5.滯後乘積（Lagged Products）－滯後乘積為關連運算（Correlation）的主要部份。對於任兩個時序領域向量f(n)、g(n)，經傅立葉轉換後的向量為F(n)、G(n)，則

N-1
Σ　f(n)g(n-k) = F(n)‧G*(n)
n=0

其中G*(n)表G(n)的共輒數。

6.時序差（Cyclic Difference）－對於任一個時序領域向量A(x)，若B(x)=A(x)-A((x-1) mod N)，則FT(B(x)) = FT(A(x)) * (1 – ω^x)。

當然，傅立葉轉換還有其他特性，如旋轉、放大、積分、微分、Parseval理論等等，這裡只是列出其最基本的特性而已。

至於傅立葉轉換到底有什麼用處？傅立葉轉換除了可用做一般數位訊號或影像在傅立葉頻譜上的處理，如影像銳化、平滑、特徵擷取、資料壓縮、濾波等等之外，也可以用來解決一些難解的數學問題。例如線性微分方程式（Linear Differential Equation），經傅立葉轉換會成為一般的代數方程式（Algebraic Equation），要解決這樣的方程式便容易多了。此外，也可加速如多項式迴旋、矩陣乘法、代數運算、高精度乘法等數學計算上的速度。其他的應用例子還很多，這些便由讀者自行看書並親自試驗去體會了。在本文的最後，我們也會列出一些實際應用的例子，供讀者參考。

第二節　快速傅立葉轉換演算法

在提到快速傅立葉轉換演算法之前，我們要先算出傳統傅立葉轉換所需的時間（Time Complexity），以便和快速傅立葉轉換做一番比較，並進行驗證的工作。傳統計算法所需的時間其實很簡單求的，因要向量長度為n時需要計算n個元素值，而每個元素值均需cn個乘法和cn個加法，其中c為一常數，因此它的時間便是O(n^2)。傳統的傅立葉轉換計算，其程式如下：

#define PI 3.141592653589793
typedef struct { float real, image; } complex;

void DFT(int data_no,complex *in_data,complex *out_data)
{
/* ————————————————————
作用：直接式的離散傅立葉轉換
參數：in_data = 輸入資料
data_no = 資料數目
輸出：out_data = 輸出資料
———————————————————— */
int m, k;
float w_cos, w_sin, angle_step, angle;

angle_step = 2*PI/data_no;　　/* 乘法單位元素根ω之間隔角度 */
for (m=0; m<data_no; m++)　　 /* 計算B(m) */
{
out_data[m].real = in_data[0].real;
out_data[m].image = in_data[0].image;
for (k=1; k<data_no; k++)　/* 計算A(k)*ω^(mk) */
{
angle = (float) m*k*angle_step;
w_cos = cos(angle);
w_sin = sin(angle);
out_data[m].real += in_data[k].real * w_cos – in_data[k].image * w_sin;
out_data[m].image += in_data[k].real * w_sin + in_data[k].image * w_cos;
}
}
}
至於反向傅立葉轉換程式要如何寫呢？當然我們也可以直接從定義來寫，但其實正向與反向傅立葉轉換程式是相當類似的，因此我們提出了兩種做法：

1.將乘法單位元素根ω之間隔角度2π/N由正變負，然後進行同樣的傅立葉轉換，最後將回傳值全部除以N即可（若一開始將輸入值全部除以N，其效果相同）。下面便是這個方法的反轉程式：

void IDFT(int data_no,complex *in_data,complex *out_data)
{
/* ————————————————————
作用：直接式的反向離散傅立葉轉換
參數：in_data = 輸入資料
data_no = 資料數目
輸出：out_data = 輸出資料
———————————————————— */
int m, k;
float w_cos, w_sin, angle_step, angle;

angle_step = -2*PI/data_no;　/* 乘法單位元素根ω之間隔角度改負 */
for (m=0; m<data_no; m++)　　/* 計算B(m) */
{
out_data[m].real = in_data[0].real;
out_data[m].image = in_data[0].image;
for (k=1; k<data_no; k++)　/* 計算A(k)*ω^(-mk) */
{
angle = (float) m*k*angle_step;
w_cos = cos(angle);
w_sin = sin(angle);
out_data[m].real += in_data[k].real * w_cos – in_data[k].image * w_sin;
out_data[m].image += in_data[k].real * w_sin + in_data[k].image * w_cos;
}
}
/* 傳回值全部除以N */
for (m=0; m<data_no; m++)
{
out_data[m].real /= data_no;
out_data[m].image /= data_no;
}
}

2.將輸入資料改成共輒數，呼叫傅立葉轉換程式後，再將輸出資料全部除以N，並轉成共輒數。以下為這個方法的反轉程式：

/* 將輸入資料全部改成共輒數 */
for (m=0; m<data_no; m++) in_data[m].image = -in_data[m].image;
/* 進行傅立葉轉換 */
DFT(data_no,in_data,out_data);
/* 將輸入資料改回原樣 */
for (m=0; m<data_no; m++) in_data[m].image = -in_data[m].image;
/* 傳回值全部除以N, 並改成共輒數 */
for (m=0; m<data_no; m++)
{
out_data[m].real /= data_no;
out_data[m].image = -out_data[m].image/data_no;
}
}
上面這兩種方式以第一種速度較快些，若能使用預先計算ω^k值方式，將此值均除以N，那麼傳回值便不必再全部除以N，速度會更快些。但重複程式碼也比較多些，這是無法兩全的事。而第二者卻有不重複程式碼的好處。為了能適應各種不同的傅立葉轉換演算法，且不必在反向傅立葉轉換程式中再寫同樣的傅立葉轉換程式碼，因此後面各演算法的反轉程式，我們將都採用第二種方法來撰寫。如果確實很在意執行速度的話，讀者不妨自行依傅立葉轉換程式碼，將之修改成第一種的反轉方式。

相對於一般直接運算的O(n^2)時間，快速傅立葉轉換所需的時間則只要O(nlogn)即可。至於快速傅立葉轉換的計算方式是怎麼來的呢？最早的時候是有人將傅立葉轉換公式視為矩陣運算：

N-1　　　　mk
B(m) = Σ　A(k) ω　　　　，m = 0～N-1
k=0

B　　= W * A

其中W為一二維矩陣，且W[m,k]=ω^mk。當N=2^L時，由ω之特性可將矩陣W分解成L個Sparse矩陣。所謂Sparse矩陣，指的一個矩陣T，其元素T[i,j]除了i=j時有值外，其餘均為0。如此每個二維Sparse矩陣對一維矩陣相乘，只需O(n)次運算，共有logn層，因此全部的運算時間只需O(nlogn)。然而分解W成為Sparse矩陣並不太容易寫成程式，因此後來便有人利用分割解決法來解決此一問題（請參閱拙作”法則分析”）。首先將傅立葉轉換公式寫成下列的多項式格式（注意，在這邊我們均假設N=2^k，如果項次不足時只要補係數為0之項次即可）：

N-1　　　 k　　　　　　 0　　N-1
B(x) = Σ　A(k) x　　　，x = ω ～ω
k=0

我們將奇數項放一堆，偶數項放一堆，成為：

B(x) = C(y)*x + D(y)　，y = x^2

其中

C(y) = A(N-1)*y^(N/2-1) + A(N-3)*y^(N/2-2) + … + A(1)
D(y) = A(N-2)*y^(N/2-1) + A(N-4)*y^(N/2-2) + … + A(0)

由於ω^2亦為乘法單位元素的主要N/2次方根，因此C(y)和D(y)均可視為A的奇次項和偶數項之傅立葉轉換後的值。又由於ω是乘法單位元素的主要N次方根，依其特性可知ω^(N/2)=-1，因此前式可綜整為：

j　　　　　　 2j　　j　　　 2j
B(ω　　 ) =　C(ω　) ω　+ D(ω　)

j+N/2　　　　 2j　　j　　　 2j
B(ω　　 ) = -C(ω　) ω　+ D(ω　)　　，j = 0～N/2-1

假設T(n)為計算長度n向量的傅立葉轉換時間，則依前面公式可得時間計算公式為：

T(n) = 2*T(n/2) + cn　　，c為一常數

由此公式可解得T(n)=O(nlogn)。因此利用分割解決法，便可以得到一個更有效率的演算法，我們將分割解決法的遞迴呼叫計算方式寫成下列程式：

void RecursiveFFT(int data_no,complex *data)
{
/* ————————————————————
作用：遞迴式的快速離散傅立葉轉換
參數：data_no = 資料數目
data = 輸入資料
輸出：data = 輸出資料
限制：資料數目必須為2的次方方可呼叫本函數
———————————————————— */
int i, j, new_pos;
complex temp;
float angle_step, angle, w_cos, w_sin;

if (data_no <= 2)
{
temp = data[1];
data[1].real = data[0].real – temp.real;
data[1].image = data[0].image – temp.image;
data[0].real += temp.real;
data[0].image += temp.image;
return;
}
angle_step = 2*PI/data_no;　 /* 乘法單位元素根ω之間隔角度 */

/* 奇偶分堆, 偶在左, 奇在右 */
for (i=2; i<data_no; i+=2)
{
/* 將位置i資料調至i/2 */
new_pos = i/2;
temp = data[i];
for (j=i-1; j>=new_pos; j–) data[j+1] = data[j];
data[new_pos] = temp;
}
/* 呼叫自己計算子向量 */
RecursiveFFT(data_no/=2,data);
RecursiveFFT(data_no,data+data_no);
/* 合併出答案 */
for (i=0; i<data_no; i++)
{
angle = (float) i*angle_step;
w_cos = cos(angle);
w_sin = sin(angle);
new_pos = i+data_no;
temp.real = data[new_pos].real * w_cos – data[new_pos].image * w_sin;
temp.image = data[new_pos].real * w_sin + data[new_pos].image * w_cos;
data[new_pos].real = data[i].real – temp.real;
data[new_pos].image = data[i].image – temp.image;
data[i].real += temp.real;
data[i].image += temp.image;
}
}

void RecursiveIFFT(int data_no,complex *data)
{
/* ————————————————————
作用：遞迴式的快速反向離散傅立葉轉換
參數：data_no = 資料數目
data = 輸入資料
輸出：data = 輸出資料
限制：資料數目必須為2的次方才可呼叫本函數
———————————————————— */
int i, j, new_pos;
complex temp;
float angle_step, angle, w_cos, w_sin;

if (data_no <= 2)
{
temp = data[1];
data[1].real = (data[0].real – temp.real) / 2;
data[1].image = (data[0].image – temp.image) / 2;
data[0].real = (data[0].real + temp.real) / 2;
data[0].image = (data[0].image + temp.image) / 2;
return;
}
angle_step = -2*PI/data_no;　 /* 乘法單位元素根ω之間隔角度 */
/* 奇偶分堆, 偶在左, 奇在右 */
for (i=2; i<data_no; i+=2)
{
/* 將位置i資料調至i/2 */
new_pos = i/2;
temp = data[i];
for (j=i-1; j>=new_pos; j–) data[j+1] = data[j];
data[new_pos] = temp;
}
/* 呼叫自己計算子向量 */
RecursiveIFFT(data_no/=2,data);
RecursiveIFFT(data_no,data+data_no);

/* 合併出答案 */
for (i=0; i<data_no; i++)
{
angle = (float) i*angle_step;
w_cos = cos(angle);
w_sin = sin(angle);
new_pos = i+data_no;
temp.real = data[new_pos].real * w_cos – data[new_pos].image * w_sin;
temp.image = data[new_pos].real * w_sin + data[new_pos].image * w_cos;
data[new_pos].real = (data[i].real – temp.real) / 2;
data[new_pos].image = (data[i].image – temp.image) / 2;
data[i].real = (data[i].real + temp.real) / 2;
data[i].image = (data[i].image + temp.image) / 2;
}
}

雖然我們已經寫出了一個快速的傅立葉轉換計算程式，而且經實際試驗結果，確實比前面直接計算的方式快了相當多，但由於它是遞迴呼叫方式，在效率上仍未能達到最佳。而且遞迴呼叫之前的分堆資料搬移動作也佔了相當多的時間，這點在資料數增多時便開始逐漸浮現。因此我們必須找出其非遞迴呼叫的程式，並設法去除分堆的資料搬移動作，直接併入資料計算中，如此便可再提高快速傅立葉轉換的計算速度。但這點必須展開遞迴呼叫的程序，觀察其規律性，才能得到結論。不過在展開遞迴呼叫分析前，我們先說明另一種利用多項式除法的觀念來解決成非遞迴呼叫的方式，以便後面進行遞迴呼叫展開分析時，會比較容易懂些。以下我們便說明一下這個多項式除法的觀念。

依照前面所提及之多項式：

N-1　　　 k　　　　　　 0　　N-1
B(x) = Σ　A(k) x　　　，x = ω ～ω
k=0

要求B(ω^m) 的值，即是求B(x)除以(x-ω^m)之餘數值，但m=0～N-1，全部要除上N次也是很花時間的。由於其中有相當多的重複計算，因此我們可以取適當的分式積：

m1　　　 m2　　　　　 mk
(x-ω　) (x-ω　) … (x-ω　)

求出該式除B(x)的餘式R(x)後，再以各分式除R(x)來得到所要的餘數，這樣的計算速度將會快些。當N=2^k次方時（若不足可補係數0，前面已提過），由於所有(x-ω^m)的乘積將為(x^N-1)，因此我們可以利用二分的方式持續進行分式除法（有點類似分割解決法）。例如N=8時，各層所使用的除式可列舉如下：

x^8-1
┌───────┴──────┐
x^4-1　　　　　　　　　　　　x^4-ω^4
┌───┴───┐　　　　　　┌───┴───┐
x^2-1　　　　 x^2-ω^4　　　 x^2-ω^2　　　 x^2-ω^6
┌─┴─┐　　　┌─┴─┐　　┌─┴─┐　　　┌─┴─┐
x-ω^0　x-ω^4　x-ω^2　x-ω^6　x-ω　x-ω^5　x-ω^3　x-ω^7

由其結果可發現，利用此種二分分式除法方式，不論如何調換分式，最後得到的結果都不會是我們所要的順序，但我們卻可排出一種位元反轉的順序出來。所謂位元反轉，指的是數字中的位元順序完全調換，例如3=011b，其位元反轉後的值便是110b=6。上例得到的結果便是：

B[0] B[4] B[2] B[6] B[1] B[5] B[3] B[7]

B[m]即B(ω^m)。讀者可以檢驗看看這個順序便是位元反轉順序。利用這個位元反轉順序的關係，我們便可以很容易將這些資料重排回我們所要的順序。但若要有效率的重排，不妨使用Cooley、Lewis和Welch所提出的演算法：

j = 0;
for (i=1; i<N-1; i++)　/* N為資料數 */
{
k = N/2;
while (k <= j)
{
j -= k;
k /= 2;
}
j += k;
if (i < j) 資料i和資料j對調
}
接下來我們要說明一下上述這些分項式的除法要如何計算。對於一個2r-1次多項式除以(x^r-c)的餘式，其實便是：

r-1　　　　　　　　　　　　　　r-1
A(r-1)x　 + … + A(0) + c * (A(2r-1)x　 + … + A(r))

以上述例子來說，假設各項次係數為[A7 A6 A5 A4 A3 A2 A1 A0]，則除以(x^4-ω^4)的餘數便是：

[A3 A2 A1 A0] + ω^4 [A7 A6 A5 A4]
= [A3 A2 A1 A0] – [A7 A6 A5 A4]
= [A3-A7 A2-A6 A1-A5 A0-A4]

至於各層除式所應乘上的ω^k，k值應為何呢？上面已提過，各層的分式均保持著位元反轉的順序，因此當分式在該層第i個位置時（由0計算起），k值便是i的位元反轉值。不過在實際計算時，是兩個分式為一組計算的，除了儲存空間上的考量外，也因可使用同一個ω^k值來加快速度。

依照上述的這些特性，我們已經得到一種非遞迴呼叫方式的快速傅立葉轉換演算法。至於其時間Order如何？由於上述之二分結果共有logn層，每層只需cn次運算，因此其時間也是O(nlogn)的，有興趣的讀者不妨做一下較正式的證明。下面便是我們依照上述的多項式除法觀念所寫成的程式：

void PolynomialFFT(int data_no,complex *data)
{
/* ————————————————————
作用：多項式除法的快速離散傅立葉轉換
參數：data_no = 資料數目
data = 輸入資料
輸出：data = 輸出資料
限制：1.資料數目必須為2的次方方可呼叫本函數
2.資料數目不可超過16384個
———————————————————— */
int i, j, k, level, depth, depart_no, sub_data_no, windex;
int pre_index, post_index;
float w_cos, w_sin, angle_step, angle;
complex temp;

if (data_no < 2) return;
angle_step = 2*PI/data_no;　 /* 乘法單位元素根ω之間隔角度 */
/* 計算層次數 */
depth = 15;
sub_data_no = data_no;
while ((sub_data_no & (int)0x8000) == 0)
{
sub_data_no <<= 1;
depth–;
}
data_no = 1 << depth;
/* 開始轉換 */
depart_no = sub_data_no = data_no;
for (level=0; level<depth; level++)
{
depart_no /= 2;
windex = 0;
/* 處理每個多項式除法 */
for (j=0; j<data_no; j+=sub_data_no)
{
/* 計算該多項式所應乘上的ω^k值 */
i = 0;
for (k=0; k<depth; k++)
i = (i << 1) | ((windex >> k) & 1);
angle = i*angle_step;
w_cos = cos(angle);
w_sin = sin(angle);
windex += 2;

　　 /* 處理該多項式的除法 */
for (i=0; i<depart_no; i++)
{
pre_index = i + j;
post_index = pre_index + depart_no;
temp.real = data[post_index].real * w_cos – data[post_index].image * w_sin;
temp.image = data[post_index].real * w_sin + data[post_index].image * w_cos;
data[post_index].real = data[pre_index].real – temp.real;
data[post_index].image = data[pre_index].image – temp.image;
data[pre_index].real += temp.real;
data[pre_index].image += temp.image;
}
}
sub_data_no = depart_no;
}
/* 利用位元反轉重排資料 */
j = 0;
for (i=1; i<data_no-1; i++)
{
k = data_no/2;
while (k <= j)
{
j -= k;
k /= 2;
}
j += k;
if (i < j)
{
temp = data[j];
data[j] = data[i];
data[i] = temp;
}
}
}

void PolynomialIFFT(int data_no,complex *data)
{
/* ————————————————————
作用：多項式除法的快速反向離散傅立葉轉換
參數：data_no = 資料數目
data = 輸入資料
輸出：data = 輸出資料
限制：1.資料數目必須為2的次方方可呼叫本函數
2.資料數目不可超過16384個
———————————————————— */
int m;

/* 將輸入資料全部改成共輒數 */
for (m=0; m<data_no; m++) data[m].image = -data[m].image;
/* 進行傅立葉轉換 */
PolynomialFFT(data_no,data);
/* 傳回值全部除以N, 並改成共輒數 */
for (m=0; m<data_no; m++)
{
data[m].real /= data_no;
data[m].image = -data[m].image/data_no;
}
}

讀者可在本節最後面所列的各演算法程式測試比較表發現，這個程式又比遞迴呼叫的方式快了相當多，只不過每次要算ω^k值時都必須做一次位元反轉，比較稍費時些。

現在我們再回到原來的遞迴呼叫演算法上。原來的遞迴呼叫演算法的遞迴呼叫部份，主要來自於下式（前面已列過）：

j　　　　　　 2j　　j　　　 2j
B(ω　　 ) =　C(ω　) ω　+ D(ω　)

j+N/2　　　　 2j　　j　　　 2j
B(ω　　 ) = -C(ω　) ω　+ D(ω　)　　，j = 0～N/2-1

由於遞迴呼叫計算時，計算資料來源和計算後資料存回的位置都不一樣，因此有人便想辦法調換輸入資料的順序，使得計算資料來源和計算後資料存回的位置能夠一致，如此只要用迴圈的方式一層一層計算，不必再經由遞迴呼叫的方式計算，也不用進行資料分堆搬移的動作。經過展開分析的結果，發現輸入資料的順序應該改成上面我們提到的位元反轉順序，而計算公式也改成下式（在第k層時，k由0算起）：
B[i　　] = B[i] + B[i+2^k] ω^(i*N/2^(k+1))
B[i+2^k] = B[i] – B[i+2^k] ω^(i*N/2^(k+1))　，i=0～2^k-1

由於上面公式這種交叉運算存回的特性，因此一般我們便稱之為Butterfly法，而又由於上面的公式是經由時序領域的資料分析所得來的，因此又稱之為DIT（Dicimation in Time）Butterfly法（亦稱Cooley-Tukey法）。以下便是這個方法的傅立葉轉換程式：

void DITFFT(int data_no,complex *data)
{
/* ————————————————————
作用：DIT式的快速離散傅立葉轉換
參數：data_no = 資料數目
data = 輸入資料
輸出：data = 輸出資料
限制：1.資料數目必須為2的次方方可呼叫本函數
2.資料數目不可超過16384個
———————————————————— */
int i, j, k, level, depth, depart_no, sub_data_no, wstep;
int pre_index, post_index;
float w_cos, w_sin, angle_step, angle;
complex temp;

if (data_no < 2) return;
angle_step = 2*PI/data_no;　 /* 乘法單位元素根ω之間隔角度 */
/* 計算層次數 */
depth = 15;
sub_data_no = data_no;
while ((sub_data_no & (int)0x8000) == 0)
{
sub_data_no <<= 1;
depth–;
}
data_no = 1 << depth;
/* 利用位元反轉重排輸入資料 */
j = 0;
for (i=1; i<data_no-1; i++)
{
k = data_no/2;
while (k <= j)
{
j -= k;
k /= 2;
}
j += k;
if (i < j)
{
temp = data[j];
data[j] = data[i];
data[i] = temp;
}
}
/* 開始轉換 */
depart_no = 1;
wstep = data_no;
for (level=0; level<depth; level++)
{
sub_data_no = depart_no * 2;
wstep /= 2;
for (i=0; i<depart_no; i++)
{
/* 計算該butterfly所應乘上的ω^k值 */
angle = i*wstep*angle_step;
w_cos = cos(angle);
w_sin = sin(angle);
for (j=0; j<data_no; j+=sub_data_no)
{
pre_index = i + j;
post_index = pre_index + depart_no;
temp.real = data[post_index].real * w_cos –
data[post_index].image * w_sin;
temp.image = data[post_index].real * w_sin +
data[post_index].image * w_cos;
data[post_index].real = data[pre_index].real – temp.real;
data[post_index].image = data[pre_index].image – temp.image;
data[pre_index].real += temp.real;
data[pre_index].image += temp.image;
}
}
depart_no *= 2;
}
}
void DITIFFT(int data_no,complex *data)
{
/* ————————————————————
作用：DIT式的快速反向離散傅立葉轉換
參數：data_no = 資料數目
data = 輸入資料
輸出：data = 輸出資料
限制：1.資料數目必須為2的次方方可呼叫本函數
2.資料數目不可超過16384個
———————————————————— */
int m;

/* 將輸入資料全部改成共輒數 */
for (m=0; m<data_no; m++) data[m].image = -data[m].image;
/* 進行傅立葉轉換 */
DITFFT(data_no,data);
/* 傳回值全部除以N, 並改成共輒數 */
for (m=0; m<data_no; m++)
{
data[m].real /= data_no;
data[m].image = -data[m].image/data_no;
}
}

這個程式只比多項式除法的方式稍快一點點而已，主要是因為ω^k在計算時，不必每次都要位元反轉。

除了對輸入的時序領域資料進行分析以外，也有人直接對輸出的頻譜領域資料進行分析，而得到類似於上述的演算法，這種方法稱之為DIF（Dicimation in Frequency）Butterfly法（或稱Sande-Tukey法）。其分析過程是這樣子的。對於離散式傅立葉轉換的公式：

N-1　　　　mk
B(m) = Σ　A(k) ω　　　　　，m = 0～N-1
k=0

將之分解如下（同樣假設N=2^k的情況）：

N/2-1　　　 mk　　N-1　　　　mk
B(m) =　Σ　A(k) ω　 +　Σ　A(k) ω
k=0　　　　　　 k=N/2

N/2-1　　　　　　　　　　 mN/2　　 mk
=　Σ　〔 A(k) + A(k+N/2) ω　　〕ω
k=0

於是

N/2-1　　　　　　　　　　 2mN/2　　　 2mk
B(2m)　 =　Σ　〔 A(k) + A(k+N/2) ω　　　〕　ω
k=0

N/2-1　　　　　　　　　　 (2m+1)N/2　 (2m+1)k
B(2m+1) =　Σ　〔 A(k) + A(k+N/2) ω　　　〕　ω
k=0

由於

2mN/2　　　　　(2m+1)N/2
ω　　　= 1，　ω　　　　　= -1
因此

　　　　　　N/2-1　　　　　　　　　　　 2mk
B(2m)　 =　Σ　〔 A(k) + A(k+N/2) 〕ω
k=0

N/2-1　　　　　　　　　　　 k　　 2mk
B(2m+1) =　Σ　〔 (A(k) – A(k+N/2)) ω　〕ω
k=0

這個公式便是DIF Butterfly法的由來。將其計算過程逐一列出，便可發現其運算過程恰與DIT Butterfly相反（在第k層時，k由0算起）：

B[i　　　　　] = B[i] + B[i+N/2^(k+1)]
B[i+N/2^(k+1)] = (B[i] – B[i+N/2^(k+1)]) ω^(i*2^k)，i=0～N/2^(k+1)

而所得結果便是位元反轉順序。以下便是DIF Butterfly演算法的程式：

void DIFFFT(int data_no,complex *data)
{
/* ————————————————————
作用：DIF式的快速離散傅立葉轉換
參數：data_no = 資料數目
data = 輸入資料
輸出：data = 輸出資料
限制：1.資料數目必須為2的次方方可呼叫本函數
2.資料數目不可超過16384個
———————————————————— */
int i, j, k, level, depth, depart_no, sub_data_no, wstep;
int pre_index, post_index;
float w_cos, w_sin, angle_step, angle;
complex temp, temp2;

if (data_no < 2) return;
angle_step = 2*PI/data_no;　 /* 乘法單位元素根ω之間隔角度 */
/* 計算層次數 */
depth = 15;
sub_data_no = data_no;
while ((sub_data_no & (int)0x8000) == 0)
{
sub_data_no <<= 1;
depth–;
}
data_no = 1 << depth;
/* 開始轉換 */
depart_no = sub_data_no = data_no;
wstep = 1;
for (level=0; level<depth; level++)
{
depart_no /= 2;
for (i=0; i<depart_no; i++)
{
/* 計算該butterfly所應乘上的ω^k值 */
angle = i*wstep*angle_step;
w_cos = cos(angle);
w_sin = sin(angle);
for (j=0; j<data_no; j+=sub_data_no)
{
pre_index = i + j;
post_index = pre_index + depart_no;
temp.real = data[pre_index].real + data[post_index].real;
temp.image = data[pre_index].image + data[post_index].image;
temp2.real = data[pre_index].real – data[post_index].real;
temp2.image = data[pre_index].image – data[post_index].image;
data[post_index].real = temp2.real*w_cos – temp2.image*w_sin;
data[post_index].image = temp2.real*w_sin + temp2.image*w_cos;
data[pre_index] = temp;
}
}

sub_data_no = depart_no;
wstep *= 2;
}
/* 利用位元反轉重排輸入資料 */
j = 0;
for (i=1; i<data_no-1; i++)
{
k = data_no/2;
while (k <= j)
{
j -= k;
k /= 2;
}
j += k;
if (i < j)
{
temp = data[j];
data[j] = data[i];
data[i] = temp;
}
}
}

void DIFIFFT(int data_no,complex *data)
{
/* ————————————————————
作用：DIF式的快速反向離散傅立葉轉換
參數：data_no = 資料數目
data = 輸入資料
輸出：data = 輸出資料
限制：1.資料數目必須為2的次方方可呼叫本函數
2.資料數目不可超過16384個
———————————————————— */
int m;

/* 將輸入資料全部改成共輒數 */
for (m=0; m<data_no; m++) data[m].image = -data[m].image;
/* 進行傅立葉轉換 */
DIFFFT(data_no,data);
/* 傳回值全部除以N, 並改成共輒數 */
for (m=0; m<data_no; m++)
{
data[m].real /= data_no;
data[m].image = -data[m].image/data_no;
}
}

由於和DIT Butterfly法計算方式相當類似，因此這兩種方法的執行時間也相差不多（雖然DIF Butterfly法略快一點點）。那麼為什麼要再找出DIFButterfly法呢？其實這兩種方法在一般的情況下的傅立葉轉換是相同的，但在某些條件下的傅立葉轉換卻是有不同的優點的。例如傅立葉轉換修剪（Pruning）問題，當輸入訊號或輸出訊號只有一部份有值，其餘均為0的情況下，某些運算是可以省略的。而在輸入訊號較少時，可由DIT Butterfly法修改出更有效率的演算法，在輸出訊訊號較少時，可由DIF Butterfly法修改出更有效率的演算法，因此這兩種不同的方法，仍然是有其不同的應用價值的。但作者對這些特殊條件下的傅立葉轉換並不想多談（主要也是因為太多了），因此有興趣的讀者不妨多找這方面的書來看，以看完本文後所打下來的基礎，相信看起來應不吃力才對。

其實上述的DIT Butterfly法和DIF Butterfly法還有很多的形式，主要是在計算次序的安排上。例如將兩兩交叉運算存回的位置安排成每層都相同，較有利於硬體上的製作；或是利用額外的空間暫存，使得輸出入資料的順序都不必做位元反轉順序重排等等。不過這些改變對於程式執行效率上來講，改進都不大。另外必須提的一點是，資料的位元反轉順序重排，可直接加在輸入資料，亦可直接加在輸出資料，兩者的結果是相同的。因此不管DIT Butterfly法或DIF Butterfly法，我們均可將位元反轉順序重排的程式調至前頭或後頭。

雖然我們已經列出各種不同現有的演算法及程式，但實際寫作時卻還有許多變化來加速，例如改良指標處理方式、或將ω^k的cos、sin計算提到迴圈外面等等，但減少的時間均相當有限。對於後者，我們也以DIF Butterfly為例修改程式如下，供讀者參考。其中將記錄ω^k值的變數做成公用變數，以便將來能在二維的傅立葉轉換中使用。其實DIT Butterfly或多項式除法等演算法，將ω^k的cos、sin計算提到迴圈外面之後，這幾種演算法的計算速度便相差無幾了，只是要多使用些記憶體罷了。當然，我們也考慮到了記憶體不夠的情況。只不過增加N/2的記憶體只減少N/2的cos、sin計算，是否划算，便取決於讀者了。但和一維不同的是，通常在二維的傅立葉轉換計算上都會先算好並記錄ω^k的值，主要因使用O(n)的記憶體可節省O(n^2)的時間，是非常划算的事。

2004/11/9 下午 10:10:05

static Keep_No=0;
static complex *W_Root;
void FFT(int data_no,complex *data)
{
/* ————————————————————
作用：加強版的快速離散傅立葉轉換（採DIF Butterfly法）
參數：data_no = 資料數目，若為0表示要清除函數內使用之記憶體
data = 輸入資料
輸出：data = 輸出資料
限制：1.資料數目必須為2的次方方可呼叫本函數
2.資料數目不可超過16384個
———————————————————— */
int i, j, k, level, depth, depart_no, sub_data_no, wstep;
int pre_index, post_index;
float w_cos, w_sin, angle_step, angle;
complex temp, temp2;

if (data_no == 0)
{
if (Keep_No != 0) free(W_Root);
Keep_No = 0;
}
if (data_no < 2) return;
angle_step = 2*PI/data_no;　 /* 乘法單位元素根ω之間隔角度 */
/* 計算層次數 */
depth = 15;
sub_data_no = data_no;
while ((sub_data_no & (int)0x8000) == 0)
{
sub_data_no <<= 1;
depth–;
}
data_no = 1 << depth;
/* 取得並計算ω^k值 */
if (data_no != Keep_No)
{
if (Keep_No != 0) free(W_Root);
Keep_No = 0;
/* 只需算一半即可 */
sub_data_no = data_no / 2;
W_Root = (complex*) malloc(sub_data_no*sizeof(complex));
if (W_Root != NULL)
{
Keep_No = data_no;
for (i=0; i<sub_data_no; i++)
{
angle = i*angle_step;
W_Root[i].real = cos(angle);
W_Root[i].image = sin(angle);
}
}
}
/* 開始轉換 */
depart_no = sub_data_no = data_no;
wstep = 1;
for (level=0; level<depth; level++)
{
depart_no /= 2;
for (i=0; i<depart_no; i++)
{
/* 計算該butterfly所應乘上的ω^k值 */
j = i*wstep;
if (Keep_No != 0)
{
w_cos = W_Root[j].real;
w_sin = W_Root[j].image;
}
else
{
angle = j*angle_step;
w_cos = cos(angle);
w_sin = sin(angle);
}

for (j=0; j<data_no; j+=sub_data_no)
{
pre_index = i + j;
post_index = pre_index + depart_no;
temp.real = data[pre_index].real + data[post_index].real;
temp.image = data[pre_index].image + data[post_index].image;
temp2.real = data[pre_index].real – data[post_index].real;
temp2.image = data[pre_index].image – data[post_index].image;
data[post_index].real = temp2.real*w_cos – temp2.image*w_sin;
data[post_index].image = temp2.real*w_sin + temp2.image*w_cos;
data[pre_index] = temp;
}
}
sub_data_no = depart_no;
wstep *= 2;
}
/* 利用位元反轉重排輸入資料 */
j = 0;
for (i=1; i<data_no-1; i++)
{
k = data_no/2;
while (k <= j)
{
j -= k;
k /= 2;
}
j += k;
if (i < j)
{
temp = data[j];
data[j] = data[i];
data[i] = temp;
}
}
}

void IFFT(int data_no,complex *data)
{
/* ————————————————————
作用：加強版的快速反向離散傅立葉轉換（採DIF Butterfly法）
參數：data_no = 資料數目
data = 輸入資料
輸出：data = 輸出資料
限制：1.資料數目必須為2的次方方可呼叫本函數
2.資料數目不可超過16384個
———————————————————— */
int m;

/* 將輸入資料全部改成共輒數 */
for (m=0; m<data_no; m++) data[m].image = -data[m].image;
/* 進行傅立葉轉換 */
FFT(data_no,data);
/* 傳回值全部除以N, 並改成共輒數 */
for (m=0; m<data_no; m++)
{
data[m].real /= data_no;
data[m].image = -data[m].image/data_no;
}
}

也許讀者有個問題：如果給定的向量長度不是2的次方，而且也無法預留補0係數的空間，那麼要如何運用上面的程式呢？在這個情況下，便無法使用快速傅立葉轉換的演算法了，只好使用最原始的計算方式了。不過由於前面所列的DFT程式，其ω^k值的cos、sin運算相當頻繁，嚴重影響了執行速度，因此我們便將此部份提到迴圈外，而程式便修改如下，但執行效率仍不夠好，這是沒有辦法的事（其實仍是有方法的，但由於頗為複雜，故不在此討論）。
void EnhancedDFT(int data_no,complex *in_data,complex *out_data)
{
/* ————————————————————
作用：加強版的直接式離散傅立葉轉換
參數：data_no = 資料數目，若為0表示要清除函數內使用之記憶體
in_data = 輸入資料
輸出：out_data = 輸出資料
———————————————————— */
static keep_no=0;
static complex *w_root;
int m, k, i;
float w_cos, w_sin, angle_step, angle;

if (data_no == 0)
{
if (keep_no != 0) free(w_root);
keep_no = 0;
}
if (data_no < 2) return;
angle_step = 2*PI/data_no;　 /* 乘法單位元素根ω之間隔角度 */
/* 取得並計算ω^k值 */
if (data_no != keep_no)
{
if (keep_no != 0) free(w_root);
keep_no = 0;
w_root = (complex*) malloc(data_no*sizeof(complex));
if (w_root != NULL)
{
keep_no = data_no;
for (i=0; i<data_no; i++)
{
angle = i*angle_step;
w_root[i].real = cos(angle);
w_root[i].image = sin(angle);
}
}
}
for (m=0; m<data_no; m++)　　 /* 計算B(m) */
{
out_data[m].real = in_data[0].real;
out_data[m].image = in_data[0].image;
for (k=1; k<data_no; k++)　/* 計算A(k)*ω^(mk) */
{
if (keep_no != 0)
{
i = (int) ((long) m*k % data_no);
w_cos = w_root[i].real;
w_sin = w_root[i].image;
}
else
{
angle = (float) m*k*angle_step;
w_cos = cos(angle);
w_sin = sin(angle);
}
out_data[m].real += in_data[k].real * w_cos – in_data[k].image * w_sin;
out_data[m].image += in_data[k].real * w_sin + in_data[k].image * w_cos;
}
}
}

void EnhancedIDFT(int data_no,complex *in_data,complex *out_data)
{
/* ————————————————————
作用：加強版的直接式反向離散傅立葉轉換
參數：data_no = 資料數目
in_data = 輸入資料
輸出：out_data = 輸出資料
———————————————————— */
int m;

/* 將輸入資料全部改成共輒數 */
for (m=0; m<data_no; m++) in_data[m].image = -in_data[m].image;
/* 進行傅立葉轉換 */
EnhancedDFT(data_no,in_data,out_data);
/* 將輸入資料改回原樣 */
for (m=0; m<data_no; m++) in_data[m].image = -in_data[m].image;
/* 傳回值全部除以N, 並改成共輒數 */
for (m=0; m<data_no; m++)
{
out_data[m].real /= data_no;
out_data[m].image = -out_data[m].image/data_no;
}
}

現在我們實際對以上七種傅立葉轉換程式的執行速度做一番測試，以便讓讀者能直接從這些測試數據裡了解到各個程式的優劣。下表便是上述幾種傅立葉轉換計算程式的平均執行時間比較表（取100次平均），內部的數據為秒。

┌───────┬───┬───┬───┬───┐
│資料數目　　　│　 64 │　256 │ 1024 │ 4096 │
├───────┼───┼───┼───┼───┤
│DFT　　　　　 │　0.13│　2.07│ 33.62│541.43│
├───────┼───┼───┼───┼───┤
│RecursiveFFT　│　0.01│　0.09│　1.13│ 15.54│
├───────┼───┼───┼───┼───┤
│PolynomialFFT │　0.00│　0.02│　0.09│　0.43│
├───────┼───┼───┼───┼───┤
│DITFFT　　　　│　0.00│　0.02│　0.09│　0.41│
├───────┼───┼───┼───┼───┤
│DIFFFT　　　　│　0.00│　0.02│　0.09│　0.40│
├───────┼───┼───┼───┼───┤
│FFT　　　　　 │　0.00│　0.01│　0.07│　0.34│
├───────┼───┼───┼───┼───┤
│EnhancedDFT　 │　0.04│　0.60│　9.83│160.16│
└───────┴───┴───┴───┴───┘
表一　各傅立葉轉換計算程式執行時間比較表

最後要再提的一點便是，根據第一節所提的傅立葉轉換特性三，我們可利用FFT一次找出兩個實數向量的傅立葉轉換值。由於在時間領域中，我們通常是使用實數向量，因此這個方法可使我們在進行多個向量的傅立葉轉換時，可省下近一半的時間。至於如何寫成程式，只是呼叫上述傅立葉轉換程式後的一些簡單計算而已，我們留給讀者當一個習題好了。但是要注意的是，這個特性在反轉過程中便無法適用，而對於二維的傅立葉轉換，也只能節省1/4的時間而已。
第三節　二維傅立葉轉換

關於二維傅立葉轉換，其定義如下：

∞　　　　　i2π(ux+vy)
F(u,v) = ∫∫ f(x,y) e　　　　　 dxdy
-∞

∞　　　　　i2π(xu+yv)
f(x,y) = ∫∫ F(u,v) e　　　　　 dudv
-∞

亦可定義成：

∞　　　　　i(ux+vy)
F(u,v) = ∫∫ f(x,y) e　　　 dxdy
-∞

1　　∞　　　　 i(xu+yv)
f(x,y) = ───∫∫ F(u,v) e　　　 dudv
4π^2　-∞

至於二維離散傅立葉轉換之定義則為下式：

M-1 N-1　　　　 i2π(sm/M+tn/N)
B(s,t) = Σ　Σ　A(m,n) e　　　　　　　，s = 0～M-1、t = 0～N-1
m=0 n=0

1　M-1 N-1　　　　 -i2π(ms/M+nt/N)
A(m,n) = ─　Σ　Σ　B(s,t) e　　　　　，m = 0～M-1、n = 0～N-1
MN　s=0 t=0

由於上式可寫成：

M-1　 N-1　　　　 i2πtn/N　 i2πsm/M
B(s,t) = Σ 〔 Σ　A(m,n) e　　　　〕e
m=0　 n=0

1 M-1　 1 N-1　　　　 -i2πnt/N　 -i2πms/M
A(m,n) = ─ Σ〔 ─ Σ　B(s,t) e　　　　〕e
M s=0　 N t=0

因此我們可視二維傅立葉轉換為先對行或列進行一維傅立葉轉換後，再對列或行進行一維傅立葉轉換的結果。利用第二節所述之快速傅立葉轉換，我們便可以得到一個O(n^2logn)的演算法。下面便是這個演算法的程式：

void Decomposed2DFFT(int x_no,int y_no,complex **data)
{
/* ————————————————————
作用：行列分解法快速二維離散傅立葉轉換
參數：x_no = 列數，若為0表示要清除函數內使用之記憶體
y_no = 行數
data = 輸入資料
輸出：data = 輸出資料
限制：1.行數及列數均必須為2的次方方可呼叫本函數
2.行數或列數不可超過16384個
———————————————————— */
int i, j, k, level, depth, depart_no, sub_data_no, wstep;
int pre_index, post_index;
float w_cos, w_sin, angle_step, angle;
complex temp, temp2;

if (x_no == 0)
{
if (Keep_No != 0) free(W_Root);
Keep_No = 0;
}
/* 先對各行進行一維傅立葉轉換 */
for (line=0; line<y_no; line++) FFT(x_no,data[line]);
/* 再對各列進行一維傅立葉轉換 */
if (y_no < 2) return;
angle_step = 2*PI/y_no;　 /* 乘法單位元素根ω之間隔角度 */
/* 計算層次數 */
depth = 15;
sub_data_no = y_no;
while ((sub_data_no & (int)0x8000) == 0)
{
sub_data_no <<= 1;
depth–;
}
y_no = 1 << depth;

/* 取得並計算ω^k值 */
if (y_no != Keep_No)
{
if (Keep_No != 0) free(W_Root);
Keep_No = 0;
sub_data_no = y_no / 2;
W_Root = (complex*) malloc(sub_data_no*sizeof(complex));
if (W_Root != NULL)
{
Keep_No = y_no;
for (i=0; i<sub_data_no; i++)
{
angle = i*angle_step;
W_Root[i].real = cos(angle);
W_Root[i].image = sin(angle);
}
}
}
/* 開始轉換各列 */
for (line=0; line<x_no; line++)
{
depart_no = sub_data_no = y_no;
wstep = 1;
for (level=0; level<depth; level++)
{
depart_no /= 2;
for (i=0; i<depart_no; i++)
{
/* 計算該butterfly所應乘上的ω^k值 */
j = i*wstep;
if (Keep_No != 0)
{
w_cos = W_Root[j].real;
w_sin = W_Root[j].image;
}
else
{
angle = j*angle_step;
w_cos = cos(angle);
w_sin = sin(angle);
}
for (j=0; j<y_no; j+=sub_data_no)
{
pre_index = i + j;
post_index = pre_index + depart_no;
temp.real = data[pre_index][line].real +
data[post_index][line].real;
temp.image = data[pre_index][line].image +
data[post_index][line].image;
temp2.real = data[pre_index][line].real –
data[post_index][line].real;
temp2.image = data[pre_index][line].image –
data[post_index][line].image;
data[post_index][line].real = temp2.real*w_cos –
temp2.image*w_sin;
data[post_index][line].image = temp2.real*w_sin +
temp2.image*w_cos;
data[pre_index][line] = temp;
}
}
sub_data_no = depart_no;
wstep *= 2;
}

　/* 利用位元反轉重排輸入資料 */
j = 0;
for (i=1; i<y_no-1; i++)
{
k = y_no/2;
while (k <= j)
{
j -= k;
k /= 2;
}
j += k;
if (i < j)
{
temp = data[j][line];
data[j][line] = data[i][line];
data[i][line] = temp;
}
}
}
}

void Decomposed2DIFFT(int x_no,int y_no,complex **data)
{
/* ————————————————————
作用：行列分解法快速二維離散傅立葉轉換
參數：x_no = 列數
y_no = 行數
data = 輸入資料
輸出：data = 輸出資料
限制：1.行數及列數均必須為2的次方方可呼叫本函數
2.行數或列數不可超過16384個
———————————————————— */
float divide;
complex *sub_data;
int x, y;

/* 將輸入資料全部改成共輒數 */
for (y=0; y<y_no; y++)
{
sub_data = data[y];
for (x=0; x<x_no; x++) sub_data[x].image = -sub_data[x].image;
}
/* 進行二維傅立葉轉換 */
Decomposed2DFFT(data_no,data);
/* 傳回值全部除以M*N, 並改成共輒數 */
divide = (float) x_no * y_no;
for (y=0; y<y_no; y++)
{
sub_data = data[y];
for (x=0; x<x_no; x++)
{
sub_data[x].real /= divide;
sub_data[x].image = -sub_data[x].image/divide;
}
}
}

上面這個程式是假設使用動態記憶體來存放資料，若使用的是靜態記憶體，那麼上面的函數宣告上便必須做一些更動，但程式實體並不需改變。為了方便讀者使用，作者也寫了一個通用型的二維動態記憶體取得及釋放函數如下：

void** Alloc2DArray(int xsize,int ysize,int item_size)
{
/* ————————————————————
作用：取得二維陣列動態記憶體
參數：xsize, ysize = 二維陣列大小
item_size = 陣列元素大小
傳回：記憶體空間指標，若為NULL表記憶體不夠
限制：xsize*item_size必須小於65536
———————————————————— */
int i, j;
void** ptr;

ptr = (void**) malloc(ysize*sizeof(void*));
if (ptr == NULL) return NULL;
for (i=0; i<ysize; i++)
{
ptr[i] = (void*) malloc(xsize*item_size);
if (ptr[i] == NULL)
{
/* 記憶體不夠, 釋放原取得之記憶體 */
for (j=0; j<i; j++) free(ptr[j]);
free(ptr);
return NULL;
}
}
return ptr;
}
void Free2DArray(int xsize,int ysize,void **ptr)
{
/* ————————————————————
作用：釋放二維陣列動態記憶體
參數：xsize, ysize = 二維陣列大小
ptr = 記憶體空間指標
備註：xsize實際上並沒用到，但為容易辨識使用，故加上此一參數
———————————————————— */
int i;

if (ptr == NULL) return;
for (i=0; i<ysize; i++) free(ptr[i]);
free(ptr);
}

例如要取得256×128的complex二維陣列記憶體空間，只需呼叫下列函數：

data = (complex**) Alloc2DArray(256,128,sizeof(complex));
if (data == NULL) 記憶體不夠…

要釋放時則呼叫下列函數：

Free2DArray(256,128,data);

這樣的函數對於二維資料的記憶體空間處理上是頗為方便的。

現在我們已經得到一個快速的二維傅立葉轉換演算法，但是否有更快的演算法？答案是有的，我們仍然可以利用第二節所述的分割解決法來得到更快的演算法，此種方法稱為Vector-Radix法。對於二維傅立葉轉換的公式：

M-1 N-1　　　　 i2π(sm/M+tn/N)
B(s,t) = Σ　Σ　A(m,n) e　　　　　　　，s = 0～M-1、t = 0～N-1
m=0 n=0

……. (未完) …….
……. 以下暫略, 可能不再寫下去 ………　青衫 ’97 3/29
2004/11/9 下午 10:15:42

第四節　傅立葉轉換的應用

一．傅立葉頻譜顯示

二．地程高度量測誤差模式

三．矩陣乘法

習題

1.試由傅立葉積分式，導出傅立葉轉換公式。
2.證明第一節所述之各種傅立葉轉換特性。
3.證明多項式除法、DIT Butterfly法、DIF Butterfly法之傅立葉轉換計算時間均為O(nlogn)。
4.證明2r-1次多項式除以(x^r-c)的餘式為

r-1　　　　　　　　　　　　　　r-1
A(r-1)x　 + … + A(0) + c * (A(2r-1)x　 + … + A(r))

其中A(n)為x^n的係數。
5.請以N=16為例，展開DIT Butterfly法和DIF Butterfly法的運算方式，並驗證第二節所列的運算公式無誤。
6.試寫出傅立葉轉換修剪的程式（包括輸入資料修剪和輸出資料修剪兩種）。
7.試利用傅立葉轉換的共輒對稱性，寫出一個同時計算兩個實數向量的傅立葉轉換程式。

參考文獻

1."Image Processing – Theroy, Algorithms, and Architectures"，Maher A. Sid-Ahmed，1995，ISBN 0-07-113697-5。
2."Digital Image Processing"，Rafael C. Gonzalez、Richard E. Woods，1992，開發。
3."Fractal Image Compression"，Progress Report，July 1992。
4."C Language Algorithms for Digital Signal Processing"，Paul M. Embree、Bruce Kimble，1991，開發。
5."Principles of Pictorial Information Systems Design"，Shi-Kuo Chang，1989，ISBN 0-13-710393-X。
6."The Fast Fourier Transform and Its Applications"，E. Oran Brigham，1988，ISBN 0-13-307547-8。
7."One-Dimensional Digital Signal Processing"，Chi-Tsong Chen，1985，歐亞。
8."The Art of Computer Programming"，Vol.2，Donald E. Knuth，1985，台北。
9."Discrete Fourier Transformation and Its Application to Power Spectra Estimation"，Nezih C. Geckinli、Davras Yavusz，1985，巨擘。
10."Two-Dimensional Fast Fourier Transforms in Image Processing"，Donald E. Lewis，Systems Research Laboratories, Inc.，1984。
11."Advanced Engineering Mathematics"，Erwin Kreyszig，1983，淡江。
12."Fast Transforms – Algorithms, Analyses, Applications"，Douglas F. Elliott、K. Ramamohan Rao，1982，ISBN 0-12-237080-5。
13."Fast Fourier Transform and Convolution Algorithms"，Henri J. Nussbaumer，1982，ISBN 0-387-11825-X。
14."2-D FFTs of Large Images with AP-120B"，Richard E. Twogood，Floating Point Systems Users Group Meeting，Feb. 1980。
15."Computer Image Processing and Recognition"，Ernest L. Hall，1979，儒林。
16."Digital Image Processing"，William K. Pratt，1978，儒林。
17."The Design and Analysis of Computer Algorithms"，Aho、Hopcroft、Ullman，1974，開發。
18."Terrain Correlation Suitability"，Wang Tang、Robert L. McClintock，SPIE Vol.2220。
19."數位影像處理"，余松煜、周源華、吳時光編著，1993，儒林。
20."數位信號處理"，王啟川、郭弘政編譯，1992，全華。
21."數位影像處理"，衛祖賞編著，1989，全華。

Boyer-Moore string search algorithm

發表： 2011 年 05 月 20 日 | 作者： vinsonhsieh | Filed under: 程式面面觀 - 作者: VinsonHsieh, 演算法 | 發表留言

Boyer-Moore string search algorithm

http://en.wikipedia.org/wiki/Boyer-Moore_string_search_algorithm

網路上看了一些介紹覺得實在說的不太清楚，所以來補充一下。

字串搜尋是非常重要的技術，想想Google就可以理解，雖然底下介紹的方法還有很多可以調整的更快，比如改善查表效率或簡單用空間換取時間，但也夠我們好好想想玩玩了。

首先，很簡單的比較字串(暴力法)就是從文章頭一個字一個字的移動每次帶一連續字串來跟關鍵字比直到文章尾，所以很直覺的改善就是有沒有辦法在比較時跳步也就是跳多一點字，不要一個字一個字的移動比較，省一些時間，想辦法多跳一點就是關鍵。

這個方法可以從兩個面向看，比較時都是反過來從字串尾開始比起，

good suffix shift
bad character shift

Good suffix shift 指的就是比較時有match到某些關鍵字的case。

舉例來說，我們想要search “ANPANMAN”,有部分match的狀況有八種如下，

第一種就是，倒數第一個字就錯，所以很直覺得就是整個關鍵字串應該往右跳一個字，看文章的下一個字會不會match (N)，當然這樣的狀況就是沒有賺，因為只移一個字。

第二種就是對一個字，倒數第一個字(N)對了，但第二個字(A)錯了，所以你應該要往關鍵字串左邊看過去，因為關鍵字這時必須決定往文章右移到何處才能繼續比，就是有(N)的位置要看一下，也就是ANPANMAX，先找(N)因為(N)是可以match的，我們知道第二個字肯定不能是(A)，不幸的是這個case有(N)但旁邊都是(A)所以沒有用，因此肯定的，關鍵字整個往右移八個跟文章的下八個字倒數過來再重新開始比，因為移動八個字，這樣省了八次一個個取出比較關鍵字的字串時間，很賺。

第三種就是對兩個字，ANPANMXX，可以確定第三個字一定不是M，所以往左看，PAN，AN對了，P是有機會對的，所以向右移三個字，繼續拿MAN比文章新的位置，記住除非MAN對了否則不用比PAN。賺了三次。

第四種就是對三個字，ANPANXXX，往左看雖已經沒有對的MAN，但最前面的AN依舊是可能符合答案的，MAN會對三個字，那最前面兩個字的AN對齊文章本來的MAN的AN，當然有機會文章在MAN後面會是MANPANMAN，所以可以往文章右處移動六個字，也就是不能移太多到八個字喔。

第五種之後，因為至少AN會對，可以當移動後的開頭，所以都只能移六個字喔。

當然你可以把這個rule寫成公式。

i	Pattern	Left Shift
0	N	It is true that the next letter to the left in ‘ANPANMAN’ is not N (it is A), therefore the pattern N must shift one left position for a match; then = 1
1	AN	AN is not a substring in ANPANMAN, then : Left_Shift is the number of letters in ‘ANPANMAN’ = 8
2	MAN	Substring MAN match with ANPANMAN three positions to the left. Then Left_Shift = 3
3	NMAN	We see that ‘NMAN’ is not a substring of ‘ANPANMAN’ but ‘NMAN’ is a possible substring 6 positions away to the left : (‘NMANPANMAN’); then = 6
4	ANMAN	6
5	PANMAN	6
6	NPANMAN	6
7	ANPANMAN	6

除了這種方式跳字外，還有就是比較簡單的思維Bad character shift，也是當倒數比較時，遇到不match的case，這時用另一個方向想，也就是尋找關鍵字內有沒有符合文章中這個不match的單字，來決定怎麼跳。

舉一樣的例子來看，尋找關鍵字ANPANMAN，當倒數比較前八個字的文章單字時，如果落在(ANPM)字母外，也就是下面表格所謂其他的狀況想當然馬上往文章右邊移動八個，接者如果文章下八個字的倒數第一個字是A不是N時，就不能把關鍵字再右移八個字，而是要找關鍵字第一個地方出現A的位置，也就是這裏只能右移一個字，ANPANMAN，移完後比較關鍵字新的八個位置跟文章八個字有沒有match，如果文章倒數第一個字又不是N而是M，此時關鍵字就往文章右移兩個位置，ANPANMAN，讓M對齊先，然後再檢查一次新的文章八個字跟關鍵字有沒有一樣，若關鍵字N對到的新文章若變成P，此時關鍵字就往文章右移五個位置(賺死了)，移完後比較關鍵字新的八個位置跟文章八個字有沒有match，如果文章倒數第一個字是N但八個字不match，這時只能文章右移動三個字，因為要對齊下一個N，ANPANMAN，移完後比較關鍵字新的八個位置跟文章八個字有沒有match，以此類推，再比較新文章八個字的倒數第一個字，所以是不是很單純吧。

Character	Shift
A	1
M	2
N	3
P	5
all other characters	8

bmBc就是Bad character shift.

BMGs就是Good suffix shift.

兩種方式同時進行，取最大的跳字數，變是此演算法的優勢之處。

當然也有人只用其中一種方法做。

在最佳情況下相當於m/n，也就是文章長度除於關鍵字長度，形成最快的跳動，也就是一次猛跳關鍵字的長度，如果文章越大時效果就越明顯。

簡單重要的平面兩線段交點問題 (計算幾何)

發表： 2011 年 01 月 13 日 | 作者： vinsonhsieh | Filed under: 程式面面觀 - 作者: VinsonHsieh, 演算法 | 發表留言

Q: 給任意兩個線段如何知道有沒有相交.

首先先複習一下外積的原理, P1 X P2 = 絕對值(P1)*絕對值(P2)*sinθ.

外積是一個與P1 及 P2垂直的向量, 且這向量的長度即是這行列式 P1 X P2 值的絕對值.

| x1 x2 |

P1 X P2 = | y1 y2 | = x1y2 – x2y1 = |P1||P2|sinθ (大小為平行四邊形面積)

這也告訴我們一件事情就是外積為0,兩線段共線…(共線面積為0)

外積方向正值(逆時針),負值(順時針). 方向垂直於兩向量.

兩頭尾相接之線段如P0->P1->P2, PoP1 X PoP2 就可以從正負值知道,P0到P1後要左轉還是右轉到達P2..

正值表示P2位於P1逆時針方向也就是往左走到P2,負值則往右偏.

接下來我們只care外積正負值也就是方向性而不care外積大小值…

現在假設有兩條線段, A線段P1P2 與 B線段P3P4, 我們可以先判斷若要有交點, 其中一線段兩端點會落於另一線段兩側,或某一個線段端點位在另一個線段上,外積在這裡又可以幫上忙了…

[一線段兩端點落於另一線段兩側]

這時可以隨便取A線段一個點,比如P1, 然後把 P4連P1, P3連P1, 多形成兩個線段, P1P4 and P1P3. 此時只要個別看 P1P4 X P1P2 乘 P1P3 X P1P2 若小於0 (一邊順時針,一邊逆時針) 表示 B (P3P4)位於線 A 兩邊, 但兩邊不代表會相交, 所以再取B 線段一點,如 P3 然後把 P1連P3, P2連P3, 多形成兩個線段, P3P1 and P3P2. 此時只要個別看 P3P1 X P3P4 乘P3P2 X P3P4 若小於0 (一邊順時針,一邊逆時針) 表示 A 位於線 B 兩邊…

所以當條件, (P1P4 X P1P2 乘 P1P3 X P1P2 小於0) and (P3P1 X P3P4 乘P3P2 X P3P4 小於0)則有相交,兩線段彼此不但互相位於彼此兩邊而且交叉…

[一個線段端點位在另一個線段上]

在上面公式中若出現外積為0的case, 就表示可能端點落在另一線段上, 這時只要確定為0時,端點x,y值是不是落於另一個線段的兩端點(x1, y1) 跟 (x2, y2)內即可…

min(x1,x2)<= x<=max(x1,x2), min(y1,y2)<=y<=max(y1,y2)

Pseudo code 如下

線段相交(p1, p2, p3, p4)

1 L1 <- (p1 – p3) X (p4 – p3)

2 L2 <- (p2 – p3) X (p4 – p3)

3 L3 <- (p3 – p1) X (p2 – p1)

4 L4 <- (p4 – p1) X (p2 – p1)

5 if L1L2 < 0 and L3L4 < 0 then return TRUE

6 if L1=0 and 線上(p3, p4, p1) then return TRUE

7 if L2=0 and 線上(p3, p4, p2) then return TRUE

8 if L3=0 and 線上(p1, p2, p3) then return TRUE

9 if L4=0 and 線上(p1, p2, p4) then return TRUE

10 return FALSE

線上(P1, P2, Pi) // P1(x1,y1) P2(x2,y2) Pi (x,y). 求Pi是否位於P1P2上

1 if min(x1, x2) <= x <= max(x1, x2)

and min(y1, y2) <= y <= max(y1, y2)

2 then return TRUE

3 else return FALSE

Like 669699

Management@SoftwareLife

JavaScript Image Preloader

遞迴函數

Python Patterns – An Optimization Anecdote

Python Patterns – An Optimization Anecdote

And The Winner Is…

Conclusion

Epilogue

Sample Code

3:2 pulldown

內積的一種意義

Fastest way to uniqify a list in Python

最小成本生成樹 ( minimum cost spanning tree )

傅立葉轉換演算法專論

Boyer-Moore string search algorithm

簡單重要的平面兩線段交點問題 (計算幾何)

訂閱文章

近期文章

分類

熱門文章與頁面︰

最多人點選

文章存檔