Dennis Heimbigner |
Software Engineering Research Laboratory |
Computer Science Department |
University of Colorado |
Boulder, Colorado 80309-0430 USA |
dennis.heimbigner@colorado.edu |
Last Updated: | 1 October 2005 |
Latest Version: | Snobol3 Version 1.0 |
Minimum JDK Level: | JDK 1.5 |
1. | Snobol3 Language Reference |
1.1 | Lexical Structure |
1.2 | Grammar |
1.3 | Processing Cycle |
1.3.1 | Subject |
1.3.2 | Pattern Matching |
1.3.3 | Pattern Matching Elements |
1.3.4 | Branching |
1.3.5 | Built-In Functions |
1.3.6 | User Defined Functions |
2. | Usage |
3. | Examples |
4. | Compiler/Interpreter |
4.1 | Architecture |
4.2 | Execution |
4.3 | Virtual Machine Operators |
4.4 | Extending the Interpreter |
5. | Installation |
6. | Change Log |
7. | Point of Contact |
8. | License |
"Snobol 3 Primer: an introduction to the computer programming language." by Allen Forte. M.1.T. Press, 1967.If any reader of this detects an error in interpretation or in some missing feature, please contact the author.
Since there are many similarities, knowledge of Snobol4 as defined in this book
"The SNOBOL 4 programming language" (2nd Ed.) by R. E. Griswold, J. F. Poage, and I. P. Polonsky. Prentice-Hall, 1971.may be helpful.
[*] | Asterisk: rest of the line is a Comment. | ||||||
[.] | Period: this is a continuation of the preceding line. | [a-zA-Z0-9] | Label: the line has a label. | [ \t]' | Whitespace: the line has no label. | [-] | Dash: indicates certain flags. The set of flags specified in the Primer are useless in modern computing environments. Instead, this marker has been modified to allow the insertion of command line option flags (Section ?) into the program. The general form is "-name[=value]"; the value is optional and if missing is presumed be the value "true". |
The lexical elements are defined as follows (using regular expressions).
[a-zA-Z0-9.]+ | Name; Note that in Snobol3, a string of digits is a name. | |
[a-zA-Z0-9][a-zA-Z.]* | Label; Note that labels are a restricted set of names. | |
[/+-=()*,/$] | Delimiters | |
['][^'\n]*['] | String; Note the use of single quotes and that string constants may not cross line boundaries. As an extension, options exist (Section ?) to allow the use of double quotes and to allow the inclusion of escape sequences (e.g., '\n') in strings. | |
[ \t] | Whitespace; This is removed from the token stream before the parser sees it. |
Notes:
program: (statement)+ EOF;
statement: (LABEL)? (body)? EOL;
body: subject (pattern)? (replacement)? (branch)?;
subject: primary;
replacement: EQUAL (concat)?;
concat: (expr)+;
expr: term ((PLUS|MINUS) term)*;
term: primary ((MULT|DIV) primary)*;
primary: atom | LPAREN unary RPAREN ;
unary: concat | (PLUS|MINUS) concat ;
atom: var | STRING | fcncall;
var: reference | NAME;
reference: DOLLAR primary;
fcncall: FCNNAME args ;
args: LPAREN (arglist)? RPAREN ;
arglist: concat (COMMA concat)* ;
pattern: (pattest)+;
pattest: patvar | expr;
patvar: STAR patmatch STAR ;
patmatch: (var)? (SLASH primary | BAR patfcn)? | LPAREN (var)? RPAREN ;
patfcn: PATFCNNAME args ;
branch: SLASH (s (f)? | f (s)? | go);
s: SUCCESS LPAREN dest RPAREN ;
f: FAIL LPAREN dest RPAREN ;
go: LPAREN dest RPAREN ;
dest: label| RETURN | FRETURN | END | reference;
label: LABEL | NAME;
This grammar can be processed by Antlr, although Antlr was not used throughout because of the problem of building the lexer and because it was overkill for such a simple language. In the rest of this document, <...> generally refers to a grammar non-terminal from the grammar above.
As shown in the grammar, an expression is composed of the following operators: addition (+), subtraction (-), multiplication (*), and integer division (/). Concatenation is also allowed and is represented by whitespace between expressions.
Add, subtract, multiply, and divide have the usual precedence. Concat has a lower precedence than any of the arithmetic operators. Parentheses are available to alter the evaluation order in the usual way.
Unary minus and plus are allowed, but only immediately after a left parenthesis; this again avoids an ambiguity in the grammar. Unary operators may be an extension to Snobol3; the primer is unclear.
The atomic elements of an expression are string constants, variable references, and function calls. Note that for function calls, it is illegal to provide too many arguments, but it is ok to provide too few; the remaining arguments are padded with the empty string.
In order to remove any ambiguity, a subject expression must be enclosed in parentheses, unless it is an atom.
1.3.1.1 Variables: Variables are atomic elements in an expression (along with string constants and function calls). Scoping in Snobol3 is two-level: global and local (to function bodies). This is similar to C.
Variables are specified in one of two ways in Snobol3.
Shorthand Name | Syntax | Definition | |||
---|---|---|---|---|---|
StringMatch | <expression> | Initial:
Succeed if the substring
at the cursor matches the specified string that is the result of computing
the expression; fail otherwise.
Retry: fail always. | |||
Arb | *<var>* | Initial: Succeed matching zero characters. If var is specified, then assign the current matched substring to var.||||
Len | *<var>/<expr>* | Initial:
Compute <expr> as an integer, call it length.
It there are at least length characters following the cursor,
then match those characters and succeed, otherwise fail.
If var is specified, then assign the current matched substring to var.
Retry: fail always. | |||
Bal(anced) | *(<var>)* | Initial:
Examine the character under the cursor, and act as follows.
Retry: Same as initial. If var is specified, then assign the current matched substring to var. | |||
Various | *<var>|<fcncall>* | Initial:
Compute the arguments to the function and invoke the
"initial()" entry for the function (see Section 4.1.4.1).
On success, and if the var is specified,
then assign the current matched substring to var.
Retry: Invoke the "retry()" entry for the function. On success, and if the var is specified, then assign the current matched substring to var. |
= ''
If a <pattern> is specified in the statement, then if the match succeeds, then the substring of the subject that is matched by the pattern is replaced by the value of the replacement expression.
f(label1)then transfer of control goes to the statement labeled "label1". If success is signaled, then execution of the statement continues until the branch part is reached. If a success case is signaled (e.g.)
s(label1)then transfer of control goes to the label specified by the success case. It is also possible to specify an unconditional branch
(label1)that will be executed no matter what value of the condition code.
It is unclear what happens if the evaluation of an expression fails and there is no f() branch specified; does it continue to execute, or does it stop the stmt execution and move to the next statement? This interpreter assumes the latter action.
It is also unclear exactly what combinations of branches are allowed. This interpreter assumes the following combinations are allowed.
EQUALS(s,t) | Succeeds if s.equals(t) (in the Java sense); fails otherwise | |||||||||||||
UNEQL(s,t) | Succeeds if !s.equals(t); fails otherwise. | |||||||||||||
SIZE(s) | returns s.length(); always succeeds. | |||||||||||||
EQ(i,j) | Succeeds if i == j (as integers); fails otherwise. | |||||||||||||
GE(i,j) | Succeeds if i >= j (as integers); fails otherwise. | |||||||||||||
GT(i,j) | Succeeds if i > j (as integers); fails otherwise. | |||||||||||||
LE(i,j) | Succeeds if i <= j (as integers); fails otherwise. | |||||||||||||
LT(i,j) | Succeeds if i < j (as integers); fails otherwise. | |||||||||||||
NE(i,j) | Succeeds if i != j (as integers); fails otherwise. | |||||||||||||
NUM(s) | Succeeds if Integer.decode(s) succeeds ; fails otherwise. | |||||||||||||
REMDR(i,j) | Computes i % j (the remainder function); always succeeds; division by zero causes interpreter to halt. | |||||||||||||
ANCHOR() | Causes the current pattern match to operate in anchored mode; always succeeds; usually the first element in a pattern. | |||||||||||||
UNANCH() | Causes the current pattern match to operate in unanchored mode; always succeeds; usually the first element in a pattern. | |||||||||||||
PRINT(s) | cause the variable $(s) to operate in output mode. | |||||||||||||
READ(s) | cause the variable $(s) to operate in input mode. | |||||||||||||
MODE(s) | The argument string s specifies a series of one or more comma separated flags that affect the global operation of the interpreter. The currently defined mode flags are as follows. | |||||||||||||
Mode flags can be set either in the program using the MODE function or using the "-mode" option flag on the command line (Section ?). The expression "MODE('x,y')" is equivalent to specifying the option flag on the command line (e.g., -mode "x,y"). | ||||||||||||||
TRACE(s) | The argument string s specifies a series of one or more comma separated function names. After the execution of this function, the calls to this function and the return value will be traced and printed to stdout. | |||||||||||||
STRACE(s) | The argument string s specifies a series of one or more comma separated variable names. After the execution of this function, all assignments to any of these variables will be traced and printed to stderr. Note that the primer indicates that the argument, s, can only specify one variable name; it has been extended here to support multiple names. | |||||||||||||
CALL(s) | The argument string s specifies a function call. It is dynamically interpreted and the corresponding function is invoked as if it occurred at the point of the CALL function. | |||||||||||||
DEFINE(s,t,u...) | The argument string s specifies the definition of a function. The t argument specifies a label that represents the first statement in the function body. The sequence of zero or more u arguments represent the names of variables that should be given local scope when the function is executed. |
A function is defined using the "DEFINE" built-in function. This is treated in a non-standard way in that it is actually evaluated at compile time, and hence its arguments can only be constant strings. This also means that it cannot be invoked dynamically. The reasons for this is that it makes code generation overly difficult and that it seems to have no use.
The define "declaration" must precede the label which marks the function body. This means as a rule that you will have a sequence of defines interspersed with or preceding a sequence of function bodies. Appropriate unconditional branches must be associated with one or more defines to jump around the function bodies.
The general form of a function declaration is
DEFINE(s,t,u...)where the string s specifies the definition of a function. The t argument specifies a label that represents the first statement in the function body. The sequence of zero or more u arguments represent the names of variables that should be given local scope when the function is executed.
Syntactically, the function definition (string s)
and the locals list (strings u...) follow this grammar.
fcndecl: NAME LPAREN (formals)? RPAREN;
formals: namelist;
locals: namelist;
namelist: NAME (COMMA (NAME)?)*;
Invoking a function causes its actual arguments to be evaluated and assigned to the corresponding formal arguments. The formals arguments are treated as variables with local scope. The list of locals are initialized to the empty string and, of course, also have local scope. Finally, functions return values by assigning to a variable with the same name as the function. This pseudo-variable also has local scope.
The variables "SYSPOT" and "SYSPPT" are predefined. The language has been extended to also predefine the variables "stdin" and "stdout" as additional input and output variables respectively.
It is possible to define additional input and output variables using the primitive functions "READ(<var>)" and "PRINT(<var>)" respectively. The PRINT function has been extended to take a second string argument. The second argument is a comma separated list of flags. Currently the following flags are defined.
java -jar s3.jar [<options>] <program>The possible option flags are as follows. Boolean options (such as -exec) may be preceded by "no" (e.g., "-noexec") to negate the flag.
or
java -classpath s3.jar jsnobol3.Snobol3 [<options>] <program>
define('abs(abs)','abs') /(main) abs abs '-' = /(return) main stdout = abs('-5') stdout = abs('5')
Greatest Common Divisor
define('gcd(m,n)','gcd') /(main)
gcd gcd = .ne(m) m /f(return)
m = .remdr(n,m)
n = gcd /(gcd)
main stdout = gcd('12','18')
Recursive Factorial
define('rfact(n)','rfact') /(main)
rfact rfact = .le(n,'1') '1' /s(return)
rfact = n * rfact(n - '1') /(return)
main stdout = rfact('4')
Iterative Factorial
define('fact(n)','fact') /(main)
fact fact = '1'
fact2 fact = .gt(n) fact * n /f(return)
n = n - '1' /(fact2)
main stdout = fact('4')
Character | Action |
---|---|
"*" | This signals a comment; remove it. |
"." | This signals a continuation; combine with previous line to form one single line. |
"-" | This signals an option of form "name[=value]"; save in the options map and otherwise treat the line like a comment. The possible values are the same as for command line options (Section ?). |
Label Character | This signals that the line has a label; collect it and store in the "labels" table, otherwise do nothing. |
Whitespace ([ \t]) | This signals that the line has no label; do nothing. |
Pass 3 actually has a second sub-pass to handle forward referencing of labels. A binding list is kept that lists the operators that need to be bound to a specific address for a specific named label. This list is walked after all other code generation. Each specified operator is modified to refer the to address of the specified label.
The "grammar" for the AST is as follows.
program: PROGRAM (statement)*;
statement: STATEMENT LINENO (LABEL)? subj=(primary)? (pattern)? repl=(concat)? (branch)?;
concat: expr | CONCAT (expr)+;
expr: term | ADD term term | SUBSTRACT term term;
term: primary | MULT primary primary | DIV primary primary;
var: NAME String | reference;
reference: REFERENCE primary;
primary: unary | atom ;
unary: concat | NEGATE concat;
atom: STRING | var | fcncall;
fcncall: FCNCALL String arglist;
arglist: ARGLIST (concat)*;
pattern: PATTERN (pattest)*;
pattest: patmatch | expr;
patmatch: BALANCE (var)? | LEN (var)? primary | PATFCN (var)? fcncall | ARB (var)?;
branch: BRANCH dest dest dest;
dest: label | reference;
fcndel: DEFINITION String namelist;
namelist: NAMELIST (NAME)*;
LABEL: [String];
NAME: [String] ;
FCNCALL: [String];
STRING: [String];
LINENO: [Integer];
The right side describes the structure of an AST node. The names in capitals represent the type of the node. The subtrees, if any, are defined by the structures that follow the type. Thus, a "branch" AST has the type BRANCH and has three subtrees that have the structure defined by "dest".
Right sides that consist of a single bracketed type are leaves and have an associated value of that Java type. Thus, a LABEL is a leaf with the label name encoded as a Java String object.
To execute, the operator obtains its arguments, if any, and then performs its specific action. This action typically will pop its arguments, perform some computation, and push a result onto the stack.
An Operator can obtain arguments in two ways. First, it can access the stack using push(),pop(), and top(). Second, it can have a constant argument built into the Operator instance.
Operators
Add(a,b) | a = a + b |
AssignLocal(value,var) | Assign the value to a local variable. |
Assign(value,var) | Assign the value to a variable; If the variable is defined locally or globally, then assign to that variable, otherwise create locally if possible, globally otherwise. |
Begin | Do any necessary initialization at the start of the whole program. |
BeginFcn | Create the new local space for this invocation of the function. |
BeginStmt[dest,lineno] | Mark the stack for use by EndStmt, initialize the condition code, and save dest as a pointer to the end of the statement. Also, set the source line number for the statement. |
CVI | Convert the top stack value to an Integer. |
CVS | Convert the top stack value to a String. |
CVV | Convert the top stack value to a variable (the "$" operator). |
CallRet | Used by the CALL statement to cause a call to return after the CALL statement. |
Concat(a,b...)[N] | Concatenate the top N elements into a single string. |
Deref(var) | Replace the var with its value on the stack. |
Div(a,b) | a = a / b; using integer division |
Dup(a) | Push a copy of a onto the stack. |
End | Signal the virtual machine to stop executing. |
EndStmt[sdest,fdest] | Clear the stack down to the point marked by the previous BeginStmt. Use the condition code to start execution of the success or failure branches. |
Frame | Save the current frame and create a new one on entry to a function. |
FReturn | Set the condition code to failure and return to just after the call. |
IJump(var) | Pop the var, get its value, and use it as a label name to which to jump. |
JSR[dest] | Save the pc into the VM's frame and jump to the specified destination. This is effectively the call instruction for invoking a function. |
Jump[dest] | Jump to dest. |
Mult(a,b) | a = a * b |
Negate(a) | a = - a |
Nth | Extract the nth element in the eval stack, with n=0 as the stack top. |
Primitive[primfcn] | Invoke the specified primitive function. |
Push[value] | Push value onto the eval stack. |
Replace(var,range,rhs) | Replace the substring of the var as specified by the range with the rhs value. Pop all three from stack. |
RetVal | Extract the return value for a user defined function and load it into the return value in the VM. Recall that the return value for a user defined function is the variable with the same name as the function. |
Return | Push the VM's current return value onto the eval stack. |
Subtract(a,b) | a = a - b |
Swap(a,b) | tmp=a; a=b; b=tmp; (Swap top two eval stack elements). |
Unframe | Throw away the current frame and return to the previous frame. Used as part of the function return process. |
Pattern Related Operators
Arb | Implement the Arb pattern element. |
Balance | Implement the Balanced pattern element. |
BeginPattern(subject,var) | Load the subject string into the VM's match state. If the subject is also a variable, then load that as well. On retry, move the anchor point if allowed, otherwise fail. |
EndPattern(range...) | Successful conclusion to a match, so capture the full range matched by the pattern, remove all interim ranges, and leave the full range on the stack. |
Len | Implement the Len pattern element. |
StringMatch | Implement the StringMatch pattern element. |
It would be interesting to see if this set could be converted to use the Snobol4 macro operators.
To define the function, we add the following code to the
source file "Primitive.java".
class $REmatch // The primitive class names begin with $ as a convention.
extends Primitive
{
public $REmatch() {super(2);} // This function <= 2 arguments
public ArgType typeFor(int argi)
{return ArgType.CVS;} // All arguments are required to be a string
// The compiler will insert appropriate CVS
// operators to ensure this.
public void execute(VM vm, PrimFunction fcn) throws Failure
{
String re = (String)vm.pop(); // pop regular expression off the stack.
String s = (String)vm.pop(); // pop target string off the stack.
if(re.length() == 0)
throw new Failure(vm,"rematch: empty regular expression");
// See if the string can be matched by the RE
boolean match = false;
try {
match = s.matches(re);
} catch (PatternSyntaxException pse) {
throw new Failure(vm,"rematch: illegal regular expression: "+re);
}
vm.cc = !match; // if the match failed, then set the condition code.
setReturn(); // return empty string on success
}
}
This function is made known to the interpreter
by adding the following line to the procedure
"definePrimitives()" at the top of the file "Primitive.java".
fcnDef("rematch",(p=new $REMatch()));
*<var>|<fcncall>*
The example below defines a variant of the balanced pattern. It takes one argument: a string of length two and specifying the left and right brackets to match. Thus, the operator *(x)* would be equivalent to *x|bracket('()')*.
To define the function, we add the following code to the
source file "PatternOp.java".
class BracketOp extends PatternOp
{
char lbracket = 0;
char rbracket = 0;
public BracketOp() {super();}
public int nargs() {return 1;}
// On entry, the stack from top down contains the following items:
// top: function argument n
// function argument n-1
// ...
// function argument 1
// var name or null if none specified
public void initial() throws Failure
{
// get the bracket pair
String pair = (String)vm.pop();
// get name of our variable from stack and keep until failure
opvar = (VRef)vm.pop();
if(pair.length() != 2)
throw new Failure(vm,"Bracket: illegal bracket pair: "+pair);
// decompose the bracket pair
lbracket = pair.charAt(0);
rbracket = pair.charAt(1);
// create an initial range value
Range r = new Range(state.cursor,state.cursor);
if(!bracket(r)) {
fail(r.r0); // fail and revert the cursor
} else {
assign(r); // assign the range to the variable, if any
succeed(r); // succeed, saving the current range on the stack
}
}
public void retry() throws Failure
{
Range r = (Range)vm.pop(); // obtain the range from the last retry
if(!bracket(r)) fail(r.r0); else {assign(r); succeed(r);}
}
boolean bracket(Range r)
{
String subject = state.subject; // get the subject string
int len = subject.length(); // and length
if(r.rn >= len) return false; // if we have matched to the end,
// we cannot extend, so signal failure
char ch = subject.charAt(r.rn);
if(ch == rbracket)
return false; // if the next character is the rbracket, fail
if(ch != lbracket) {
r.rn++; // if next character is not an lbracket, advance one char
return true; // and succeed
}
// Only case left is that we are at an lbracket, need to scan
// forward to find a matching rbracket, taking nesting into account
int rn = r.rn+1;
int depth = 1;
while(rn < len && depth > 0) {
ch = subject.charAt(rn++);
if(ch == lbracket)
depth++;
else if(ch == rbracket)
depth--;
// else do nothing
}
if(depth == 0) {
r.rn = rn; // record our current last matched char + 1.
return true;
}
// we got to the end of the subject without balancing the brackets
return false;
}
}
This function is made known to the interpreter
by adding the following line to the procedure
"definePatterns()" at the top of the file "PatternOp.java".
Snobol3.patterns.put("bracket",new BracketOp());
There are some things to note.
Minor version levels are indicated in parentheses.
Author: | Dennis Heimbigner |
Software Engineering Research Laboratory | |
Computer Science Department | |
University of Colorado | |
Boulder, Colorado 80309-0430 USA | |
dennis.heimbigner@colorado.edu |
A Personal Note:
Snobol3 has always held a special fascination for me. It was the first language I learned (after FORTRAN), and it was the first language for which I built a partial interpreter for the IBM 1620 computer in about 1969. I recently found a copy of the Snobol 3 Primer, and since I had some time on my hands, I decided write an interpreter for it in Java.
Copyright (c) 2005, Dennis Heimbigner All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of thenor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.