테일 콜 최적화 란?
간단히 말해서 꼬리 호출 최적화 란 무엇입니까? 더 구체적으로, 누구든지 그 이유에 대한 설명과 함께 적용 할 수있는 작은 코드 조각을 보여줄 수 있습니까?
테일 호출 최적화는 호출 함수가 단순히 호출 된 함수에서 가져온 값을 반환하기 때문에 함수에 대한 새 스택 프레임 할당을 피할 수있는 곳입니다. 가장 일반적인 용도는 tail-recursion으로, tail-call 최적화를 활용하기 위해 작성된 재귀 함수는 일정한 스택 공간을 사용할 수 있습니다.
Scheme은 사양에서 모든 구현이이 최적화를 제공해야 함을 보장하는 몇 안되는 프로그래밍 언어 중 하나입니다 (JavaScript도 ES6부터 시작 함) . 따라서 다음은 Scheme의 계승 함수에 대한 두 가지 예입니다.
(define (fact x)
(if (= x 0) 1
(* x (fact (- x 1)))))
(define (fact x)
(define (fact-tail x accum)
(if (= x 0) accum
(fact-tail (- x 1) (* x accum))))
(fact-tail x 1))
재귀 호출이 수행 될 때 함수는 호출이 반환 된 후 결과와 관련하여 수행해야하는 곱셈을 추적해야하기 때문에 첫 번째 함수는 꼬리 재귀가 아닙니다. 따라서 스택은 다음과 같습니다.
(fact 3)
(* 3 (fact 2))
(* 3 (* 2 (fact 1)))
(* 3 (* 2 (* 1 (fact 0))))
(* 3 (* 2 (* 1 1)))
(* 3 (* 2 1))
(* 3 2)
6
대조적으로 꼬리 재귀 계승에 대한 스택 추적은 다음과 같습니다.
(fact 3)
(fact-tail 3 1)
(fact-tail 2 3)
(fact-tail 1 6)
(fact-tail 0 6)
6
보시다시피 팩트 테일 호출마다 동일한 양의 데이터 만 추적하면됩니다. 왜냐하면 우리가 얻은 값을 바로 정상으로 반환하기 때문입니다. 즉, (팩트 1000000)을 호출하더라도 (팩트 3)과 같은 양의 공간 만 필요합니다. 이것은 비 꼬리 재귀 적 사실의 경우가 아니며 큰 값으로 인해 스택 오버플로가 발생할 수 있습니다.
간단한 예를 살펴 보겠습니다. C로 구현 된 계승 함수입니다.
우리는 명백한 재귀 적 정의로 시작합니다.
unsigned fac(unsigned n)
{
if (n < 2) return 1;
return n * fac(n - 1);
}
함수가 반환되기 전 마지막 작업이 다른 함수 호출 인 경우 함수는 마무리 호출로 끝납니다. 이 호출이 동일한 함수를 호출하면 꼬리 재귀입니다.
비록 fac()
첫 눈에 보이는 꼬리 재귀 무엇 실제로 일어나는 것은, 그것은 아니다
unsigned fac(unsigned n)
{
if (n < 2) return 1;
unsigned acc = fac(n - 1);
return n * acc;
}
즉, 마지막 연산은 함수 호출이 아니라 곱셈입니다.
그러나 fac()
누적 된 값을 추가 인수로 콜 체인 아래로 전달하고 최종 결과 만 반환 값으로 다시 전달하여 꼬리 재귀 적 으로 다시 작성할 수 있습니다.
unsigned fac(unsigned n)
{
return fac_tailrec(1, n);
}
unsigned fac_tailrec(unsigned acc, unsigned n)
{
if (n < 2) return acc;
return fac_tailrec(n * acc, n - 1);
}
자, 이것이 왜 유용할까요? tail 호출 후 즉시 반환하기 때문에 tail 위치에서 함수를 호출하기 전에 이전 스택 프레임을 버릴 수 있습니다. 또는 재귀 함수의 경우 스택 프레임을 그대로 재사용 할 수 있습니다.
꼬리 호출 최적화는 재귀 코드를
unsigned fac_tailrec(unsigned acc, unsigned n)
{
TOP:
if (n < 2) return acc;
acc = n * acc;
n = n - 1;
goto TOP;
}
이것은 인라인 될 수 있고 fac()
우리는
unsigned fac(unsigned n)
{
unsigned acc = 1;
TOP:
if (n < 2) return acc;
acc = n * acc;
n = n - 1;
goto TOP;
}
이는
unsigned fac(unsigned n)
{
unsigned acc = 1;
for (; n > 1; --n)
acc *= n;
return acc;
}
여기에서 볼 수 있듯이 충분히 고급 최적화 프로그램은 tail-recursion을 반복으로 대체 할 수 있습니다. 이는 함수 호출 오버 헤드를 피하고 일정한 양의 스택 공간 만 사용하므로 훨씬 더 효율적입니다.
TCO (Tail Call Optimization) is the process by which a smart compiler can make a call to a function and take no additional stack space. The only situation in which this happens is if the last instruction executed in a function f is a call to a function g (Note: g can be f). The key here is that f no longer needs stack space - it simply calls g and then returns whatever g would return. In this case the optimization can be made that g just runs and returns whatever value it would have to the thing that called f.
This optimization can make recursive calls take constant stack space, rather than explode.
Example: this factorial function is not TCOptimizable:
def fact(n):
if n == 0:
return 1
return n * fact(n-1)
This function does things besides call another function in its return statement.
This below function is TCOptimizable:
def fact_h(n, acc):
if n == 0:
return acc
return fact_h(n-1, acc*n)
def fact(n):
return fact_h(n, 1)
This is because the last thing to happen in any of these functions is to call another function.
Probably the best high level description I have found for tail calls, recursive tail calls and tail call optimization is the blog post
"What the heck is: A tail call"
by Dan Sugalski. On tail call optimization he writes:
Consider, for a moment, this simple function:
sub foo (int a) { a += 15; return bar(a); }
So, what can you, or rather your language compiler, do? Well, what it can do is turn code of the form
return somefunc();
into the low-level sequencepop stack frame; goto somefunc();
. In our example, that means before we callbar
,foo
cleans itself up and then, rather than callingbar
as a subroutine, we do a low-levelgoto
operation to the start ofbar
.Foo
's already cleaned itself out of the stack, so whenbar
starts it looks like whoever calledfoo
has really calledbar
, and whenbar
returns its value, it returns it directly to whoever calledfoo
, rather than returning it tofoo
which would then return it to its caller.
And on tail recursion:
Tail recursion happens if a function, as its last operation, returns the result of calling itself. Tail recursion is easier to deal with because rather than having to jump to the beginning of some random function somewhere, you just do a goto back to the beginning of yourself, which is a darned simple thing to do.
So that this:
sub foo (int a, int b) { if (b == 1) { return a; } else { return foo(a*a + a, b - 1); }
gets quietly turned into:
sub foo (int a, int b) { label: if (b == 1) { return a; } else { a = a*a + a; b = b - 1; goto label; }
What I like about this description is how succinct and easy it is to grasp for those coming from an imperative language background (C, C++, Java)
Note first of all that not all languages support it.
TCO applys to a special case of recursion. The gist of it is, if the last thing you do in a function is call itself (e.g. it is calling itself from the "tail" position), this can be optimized by the compiler to act like iteration instead of standard recursion.
You see, normally during recursion, the runtime needs to keep track of all the recursive calls, so that when one returns it can resume at the previous call and so on. (Try manually writing out the result of a recursive call to get a visual idea of how this works.) Keeping track of all the calls takes up space, which gets significant when the function calls itself a lot. But with TCO, it can just say "go back to the beginning, only this time change the parameter values to these new ones." It can do that because nothing after the recursive call refers to those values.
GCC minimal runnable example with x86 disassembly analysis
Let's see how GCC can automatically do tail call optimizations for us by looking at the generated assembly.
This will serve as an extremely concrete example of what was mentioned in other answers such as https://stackoverflow.com/a/9814654/895245 that the optimization can convert recursive function calls to a loop.
This in turn saves memory and improves performance, since memory accesses are often the main thing that makes programs slow nowadays.
As an input, we give GCC a non-optimized naive stack based factorial:
tail_call.c
#include <stdio.h>
#include <stdlib.h>
unsigned factorial(unsigned n) {
if (n == 1) {
return 1;
}
return n * factorial(n - 1);
}
int main(int argc, char **argv) {
int input;
if (argc > 1) {
input = strtoul(argv[1], NULL, 0);
} else {
input = 5;
}
printf("%u\n", factorial(input));
return EXIT_SUCCESS;
}
Compile and disassemble:
gcc -O1 -foptimize-sibling-calls -ggdb3 -std=c99 -Wall -Wextra -Wpedantic \
-o tail_call.out tail_call.c
objdump -d tail_call.out
where -foptimize-sibling-calls
is the name of generalization of tail calls according to man gcc
:
-foptimize-sibling-calls
Optimize sibling and tail recursive calls.
Enabled at levels -O2, -O3, -Os.
as mentioned at: How do I check if gcc is performing tail-recursion optimization?
I choose -O1
because:
- the optimization is not done with
-O0
. I suspect that this is because there are required intermediate transformations missing. -O3
produces ungodly efficient code that would not be very educative, although it is also tail call optimized.
Disassembly with -fno-optimize-sibling-calls
:
0000000000001145 <factorial>:
1145: 89 f8 mov %edi,%eax
1147: 83 ff 01 cmp $0x1,%edi
114a: 74 10 je 115c <factorial+0x17>
114c: 53 push %rbx
114d: 89 fb mov %edi,%ebx
114f: 8d 7f ff lea -0x1(%rdi),%edi
1152: e8 ee ff ff ff callq 1145 <factorial>
1157: 0f af c3 imul %ebx,%eax
115a: 5b pop %rbx
115b: c3 retq
115c: c3 retq
With -foptimize-sibling-calls
:
0000000000001145 <factorial>:
1145: b8 01 00 00 00 mov $0x1,%eax
114a: 83 ff 01 cmp $0x1,%edi
114d: 74 0e je 115d <factorial+0x18>
114f: 8d 57 ff lea -0x1(%rdi),%edx
1152: 0f af c7 imul %edi,%eax
1155: 89 d7 mov %edx,%edi
1157: 83 fa 01 cmp $0x1,%edx
115a: 75 f3 jne 114f <factorial+0xa>
115c: c3 retq
115d: 89 f8 mov %edi,%eax
115f: c3 retq
The key difference between the two is that:
the
-fno-optimize-sibling-calls
usescallq
, which is the typical non-optimized function call.This instruction pushes the return address to the stack, therefore increasing it.
Furthermore, this version also does
push %rbx
, which pushes%rbx
to the stack.GCC does this because it stores
edi
, which is the first function argument (n
) intoebx
, then callsfactorial
.GCC needs to do this because it is preparing for another call to
factorial
, which will use the newedi == n-1
.It chooses
ebx
because this register is callee-saved: What registers are preserved through a linux x86-64 function call so the subcall tofactorial
won't change it and losen
.the
-foptimize-sibling-calls
does not use any instructions that push to the stack: it only doesgoto
jumps withinfactorial
with the instructionsje
andjne
.Therefore, this version is equivalent to a while loop, without any function calls. Stack usage is constant.
Tested in Ubuntu 18.10, GCC 8.2.
Look here:
http://tratt.net/laurie/tech_articles/articles/tail_call_optimization
As you probably know, recursive function calls can wreak havoc on a stack; it is easy to quickly run out of stack space. Tail call optimization is way by which you can create a recursive style algorithm that uses constant stack space, therefore it does not grow and grow and you get stack errors.
We should ensure that there are no goto statements in the function itself .. taken care by function call being the last thing in the callee function.
Large scale recursions can use this for optimizations, but in small scale, the instruction overhead for making the function call a tail call reduces the actual purpose.
TCO might cause a forever running function:
void eternity() { eternity(); }
Recursive function approach has a problem. It builds up a call stack of size O(n), which makes our total memory cost O(n). This makes it vulnerable to a stack overflow error, where the call stack gets too big and runs out of space. Tail Cost Optimization (TCO) Scheme. Where it can optimize recursive functions to avoid building up a tall call stack and hence saves the memory cost.
There are many languages who are doing TCO like (Javascript, Ruby and few C ) where as Python and Java do not do TCO.
JavaScript language has confirmed using :) http://2ality.com/2015/06/tail-call-optimization.html
참고URL : https://stackoverflow.com/questions/310974/what-is-tail-call-optimization
'your programing' 카테고리의 다른 글
Ajax 요청이 200 OK를 반환하지만 성공 대신 오류 이벤트가 발생합니다. (0) | 2020.09.29 |
---|---|
venv, pyvenv, pyenv, virtualenv, virtualenvwrapper, pipenv 등의 차이점은 무엇입니까? (0) | 2020.09.29 |
jQuery / JavaScript를 사용하여 모든 CSS 클래스를 제거하는 방법은 무엇입니까? (0) | 2020.09.29 |
파이썬에서 홈 디렉토리를 얻는 방법? (0) | 2020.09.29 |
Git 리포지토리에서 원격 출처를 제거하는 방법 (0) | 2020.09.29 |