1 기본 아이디어 #

문법 규칙은 Python function document strings 를 사용하여 ?DParser 에 입력된다.(Python function 의 첫번째 라인에 위치한 string 이 그 function 의 documentation string 이다) ?DParser 로 하여금 특정한 function 의 documentation string 을 당신의 문법의 일부분으로 인지하게 하려면, function 의 이름을 "d_" 로 시작하게 하면 된다. 그러면 그 function 은 documentation string 에 정의된 production 이 줄어들 때마다 실행되는 액션 이 된다. 예를 들면

def d_action1(t):
    " sentence : noun 'runs' "
    print 'found a sentence'
# ...

이 function 은 액션, d_action1, 과 production, sentence, 를 ?DParser 에게 말해준다. ?DParser 가 sentence 를 인식할 때 d_action1 은 호출된다. d_action 에 대한 아규먼트, t, 는 array 다. 이 array 는 production 을 구성하는 element 들의 리턴값들로 구성되거나 또는 terminal element 들에 대해서는 terminal 이 매치하는 string 으로 구성된다. 위에 예에서 array t 는 noun 의 액션을 첫번째 element 로 Python string 'runs' 를 두번째 element 로 포함한다.

정규표현식은 double quotes(") 로 둘러 싸서 표현한다.

def d_number(t):
    ' number : "[0-9]+" ' # match a positive integer
    return int(t[0]) # turn the matched string into an integer
# ...

당신의 documentation string 이 만약 [http]

Python escape sequences 를 포함한다면 그것이 Python raw string(r 로 시작하는) 이 되도록 해야한다. 우선순위나 연관성 같은 productions 의 진보한 특징을 알고 싶으면, [http]

the DParser manual 을 보아라.

간단하고 완벽한 덧셈 예제는 [http]

home page 를 보아라.

2 액션들에 대한 아규먼트들 #

모든 액션은 위에 설명한 적어도 하나의 array 아규먼트를 가진다. 다른 아규먼트들은 선택사항이다(optional).

인터페이스는 당신이 원하는 아규먼트가 어느 것인지 인지하는데 당신이 아규먼트에 준 이름을 사용한다. 가능한 이름들은 아래와 같다.

spec, spec_only
액션에 spec 아규먼트가 있다면 해당 액션은 will be called for both speculative and final parses (그렇지 않으면 액션은 단지 최종 파싱에서 호출될 것인다) spec 의 값은 파싱이 final 인지 speculative 인지를 가리킨다.(1 은 speculative, 0 은 final) speculative 파싱을 거부하기 위해서는 dparser.Reject 를 리턴하라. 액션이 spec_only 를 인자로 가진다면, 해당 액션은 speculative 파싱에서만 호출될 것인다. 그리고 final 파싱의 액션의 리턴 값은 speculative 파싱이 리턴했던 파이썬 오브젝트와 같을 것이다. 완벽한 예제.
g
?DParser 의 전역 state. g 는 실제로 첫번째 원소가 전역 state 인 array 이다.(원소가 하나인 array 를 사용하는 이런 방식은 액션으로 하여금 전역 state 를 바꿀 수 있게 해준다)
s
이 reduction 을 구성하는 string 들의 array 를 포함한다. s 는 당신의 파서의 목적이 어떤 텍스트를 거의 손상시키지 않고 고치는 것이라면 매우 유용하다. 완벽한 예제는 여기 를 보아라
nodes
reduction 의 D_?ParseNode 들의 파이썬 wrapper 의 array 이다. line number 같은 정보를 갖고 있다. 유용한 fields 는 여기 를 보라
this
현재 production 의 D_?ParseNode 이다. ($$ in ?DParser.) 이 예제 를 보라
parser
당신의 파서(가끔 여러 파일을 다룰 때 유용하다)

3 dparser.Parser() 에 대한 아규먼트들 #

모든 아규먼트들은 선택사항이다.

modules
당신의 파서에서 사용할 액션들을 포함하는 모듈들의 array 이다. 명시되지 않으면 호출한 모듈이 사용된다.
file_prefix
파싱 테이블 캐시의 파일이름을 위한 prefix 다. default 는 "d_parser_mach_gen" 이다.

4 dparser.Parser.parse() 에 대한 아규먼트들 #

dparser.Parser.parse 의 첫번째 아규먼트는 언제나 파싱할 string 이다. 다른 아규먼트들은 선택사항이다.

start_symbol
시작 심볼이다. default 는 제일 처음 정의된 심볼이다.
print_debug_info
non-zero 면 호출되는 액션들의 리스트를 출력한다. Question mark(?) 는 액션이 speculative 임을 가리킨다.
dont_fixup_internal_productions, dont_merge_epsilon_trees, commit_actions_interval, error_recovery
D_Parser 의 멤버들에 대응된다. (?DParser 메뉴얼을 보아라)
initial_skip_space_fn
사용자 정의 공백(as does the whitespace production, and instead of the built-in, c-like whitespace parser)을 허용한다. 아규먼트는 d_loc_t structure 이다. 이 structure 의 멤버, s, 는 파싱되고있는 string 에 대한 index 이다. 공백(whitespace)을 건너뛰고(skip) 싶으면 이 index 를 고쳐라.

def whitespace(loc): # no d_ prefix
    while loc.s < len(loc.buf) and loc.buf[loc.s:loc.s+2] == ':)':    # make smiley face the whitespace
        loc.s = loc.s + 2
#...
Parser().parse('int:)var:)=:)2', initial_skip_space_fn = whitespace)

syntax_error_fn
syntax 에러 때 호출. default 로는 예외(exception)가 발생한다. 에러의 위치를 가리키는 d_loc_t structure(initial_skip_space_fn 을 보아라) 를 넘겨준다. 아래 함수는 '<--error' 와 에러 위치의 line break 를 출력한다.

def syntax_error(loc):
    mn = max(loc.s - 10, 0)
    mx = min(loc.s + 10, len(loc.buf))
    begin = loc.buf[mn:loc.s]
    end = loc.buf[loc.s:mx]
    space = ' '*len(begin)
    print begin + '\n' + space + '<--error' + '\n' + space + end

ambiguity_fn:
모호성을 해결한다. D_?ParseNode 의 array 를 취한다.(다시 여기 를 보라) 그리고 그들 중 하나가 리턴되기를 기대한다. default 로는 dparser.?AmbiguityException 이 발생한다.

5 위험들(pitfalls)과 팁들 #

만일 당신이 pitfall 에 빠지거나 팁을 알고 있다면 내게 알려달라. 'o'/ 여기에 추가하겠다.

Grammar 디버깅
Parser.parse() 에 print_debug_info=1 을 넘겨주면 호출된 액션들의 리스트를 볼 수 있다. (2를 넘겨주면 final 액션만 볼 수 있다) 또한 생성된 grammar file 을 보아라, d_parser_mach_gen.g
정규표현식
?DParser 는 파이썬 정규식 모듈이 이해하는 모든 정규표현식을 이해하지는 못한다. ?DParser 가 이해하는 정규표현식을 사용하라.
공백(whitespace)
기본적으로 ?DParser 는 탭, 스페이스, 개행문자, #line command 를 공백으로 사용한다. 이러한 것들을 당신 스스로 제어하고 싶으면(특히 # 문자는 조심해서 다뤄라) initial_skip_space_fn 을 구현하거나 special whitespace production 을 정의하면 된다.

def d_whitespace(t):
    'whitespace : "[ \t\n]*" '     # treat space, tab and newline as whitespace, but treat the # character normally
    print 'found whitespace:' + t[0]

?DParser specifiers/declarations
?DParser 는 documentation string 에 선언함으로써 넘겨줄 수 있다. 예를 들면

from dparser imoprt Parser
def d_somefunc(t) : '${declare longest_match}'
#...

액션에서 다중 production
하나의 documentation string 에 여러개의 production 을(심지어 전체 grammar까지) 넣을 수 있다. 각각의 production 에 세미콜론을 붙여라.

from dparser import Parser

def d_grammar(t):
    '''sentence : noun verb;
    noun : 'dog' | 'cat';
    verb : 'run'
    '''
    print 'this function gets called for every reduction'

Parser().parse("dog run")