SNOBOL4IO(1) | CSNOBOL4B 2.3.2 | Janurary 1, 2024
NAME
snobol4io – SNOBOL4 file I/O
DESCRIPTION
Macro SNOBOL4 originally depended on FORTRAN libraries, unit
numbers and FORMATs for input and output. CSNOBOL4 uses the C
stdio(3) library instead, but unit numbers (INTEGERs between 1 and
256) and record lengths remain embedded in the Macro SNOBOL4 code.
I/O Associations
Output on a closed unit generates a fatal “Output error”,
see snobol4error(1).
The following variable/unit/file associations exist by default;
Variable | Unit | Association
|
INPUT | 5 | standard input (input)
|
OUTPUT | 6 | standard output (output)
|
TERMINAL | 7 | standard error (output)
|
TERMINAL | 8 | /dev/tty (input)
|
Named files
Input and output filenames can be supplied to the INPUT() and
OUTPUT() functions via an optional fourth argument.
-
filename - (hyphen)
-
is interpreted as stdin on INPUT() and stdout on OUTPUT().
-
sub-process I/O using PIPE and Pseudo-terminals
-
If the filename begins with single a vertical bar (|),
the remainder is used
as a shell command whose stdin (in the case of OUTPUT()) or stdout
(in the case of INPUT()) will be connected to the file variable via
a pipe. If a pipe is opened by INPUT() input in “update” mode, the
connection will be bi-directional (on systems with socketpair and
Unix-domain sockets). See below for how to associate a variable for
I/O in both directions.
-
-
If the filename begins with two vertical bars (||) the remainder is
used as a shell command executed with stdin, stdout and stderr
attached to the slave side of a pseudo-terminal (pty), if the system C
library contains the forkpty(3) routine. Use of ptys are necessary
when the program to be invoked cannot be run without a “terminal” for
I/O. See below on how to properly associate the I/O variable.
-
magic paths /dev/stdin, /dev/stdout, and /dev/stderr
-
/dev/stdin, /dev/stdout, and /dev/stderr refer to the current
process standard input, standard output and standard error I/O streams
respectively regardless of whether those special filenames exist on
your system.
-
magic path /dev/fd/n
-
/dev/fd/n uses fdopen(3) to open a new I/O stream associated
with file descriptor number n, regardless of whether the special
device entries exist.
-
magic paths /tcp/hostname/service, /udp/hostname/service
-
and /tls/hostname/service.
/tcp/hostname/service can be used to open connection to a TCP
server. /udp/hostname/service behaves similarly
for UDP. /tls/hostname/service opens a TLS over TCP connection
(NOTE! does not attempt to verify certificate unless "verify" option used,
and even then does not handle SNI or SAN).
Path can followed by a number of different slash separated options:
broadcast | Allow broadcast address (UDP only).
|
dontroute | Enables routing bypass for outgoing messages.
|
keepalive | Enables TCP connection keep alive messages.
|
nodelay | Send TCP data without waiting.
|
oobinline | Enables reception of out-of-band data in band.
|
priv | Bind local port number under 1024 (if allowed).
|
reuseaddr | Allow quick reuse of local addresses.
|
verify | Attempt to verify server TLS certificate.
|
-
magic pathname /dev/tmpfile
-
/dev/tmpfile opens an anonymous temporary file for reading and writing, see
tmpfile(3).
-
/dev/null and /dev/tty
-
On non-POSIX systems /dev/null and /dev/tty are magical, and
refer to the null device, and the user's terminal/console,
respectively.
I/O Options
Originally the third argument specified record length for INPUT(),
or a FORTRAN FORMAT for OUTPUT().
CSNOBOL4 interprets it as string of single letter options, commas are ignored.
Some options effect only the I/O variable named in the first argument,
others effect any variable associated with the unit number in the
second argument.
-
digits
-
A span of digits will set the input record length for the named I/O
variable. This controls the maximum string that will be returned for
regular text I/O, and the number of bytes returned for binary I/O.
Record length is per-variable association; multiple variables may be
associated with the same unit, but with different record lengths. The
default record length for input is 1024. Lines longer than the record
length will be silently truncated.
Since CSNOBOL4 2.2, record length is only honored for binary I/O, and all characters upto a newline (ASCII Line Feed) are interpreted as a single line.
-
A
-
For OUTPUT() the unit will be opened for append access (and ignored by
INPUT()). All writes will occur at the end of the file at the time
of the write, regardless of the file position before the write.
-
B
-
The unit will be opened for binary access. On input, newline
characters have no special meaning; the number of bytes transferred
depends on record length (see above). On output, no newline is
appended.
-
B
-
For terminal devices, all input from this unit will be done without
special processing for line editing or EOF; the number of characters
returned depends on the record length.
Characters which deliver signals (including interrupt, kill, and suspend)
are still processed. Units (with different fds) opened on the same terminal
device operate independently; some can use binary mode, while others
operate in text mode.
-
C
-
Character at a time I/O. A synonym for B,1.
-
E
-
Set the "close on exec" flag for the underlying file descriptor.
Depends on support by the C library fopen(3) call for 'e' in the
mode string for regular files. Honored for sockets regardless, (but
not on Windows).
-
J
-
Read and write compressed data in .xz format, using liblzma, as
written by xz(1). If a digit 0 through 9 immediately follows the
option, it will be interpreted as the compression level to use when
writing. It's claimed that level zero is "sometimes faster than gzip
-9 while compressing much better". The default compression level is
6, larger numbers will require more than 16MiB of memory to
decompress, and are only useful only when compressing files bigger
than 8 MiB (level 7), 16 MiB (level 8), and 32 MiB (level 9). Matches
the tar(1) command line option.
Added in CSNOBOL4 2.2.
-
j
-
Read and write compressed data in .bz2 format, using libbz2, as
created by bzip2(1). If a digit 1 through 9 immediately follows
the option, it will be interpreted as the compression level to use
when writing. Matches the tar(1) command line option.
Added in CSNOBOL4 2.2.
-
K
-
If an input line is longer than the input record length,
return the line in multiple reads (breaK up the line)
instead of discarding the extra characters.
Added in CSNOBOL4 2.0. Obsolete in CSNOBOL4 2.2.
-
T
-
Terminal mode. Writes are performed “unbuffered” (see below), and
no newline characters are added. On input newline characters are
returned. Terminal mode effects only the referenced unit, and does
not require opening a new file descriptor (ie; by using a magic
pathname): OUTPUT(.TT, 8, "T", "-"). Terminal mode is useful for
outputting prompts in interactive programs.
-
Q
-
Quiet mode. Turns off input echo on terminals.
Effects only input on this file descriptor.
-
U
-
Update mode. The unit is opened for both input and output.
Example of associating a variable for I/O in both directions:
unit = IO_FINDUNIT()
INPUT(.name, unit, 'U', 'filepath')
OUTPUT(.name, unit)
-
-
Useful situations for this when filepath is /dev/fd/n where n
is a file descriptor number returned by SERV_LISTEN(), or
filepath specifies a pipe (|command) or pseudo-terminal
(||command) paths.
-
-
The above sequence is also useful with when combined with fixed record
length, binary mode and the SET() function for I/O to preexisting
files. Performing OUTPUT() first will create a regular file if it
does not exist, but will also truncate a preexisting file!
-
W
-
Unbuffered mode. Each output variable assignment causes an immediate I/O
transfer to occur by direct read(1) or write(1) system calls,
rather than collecting the data in a buffer for efficiency.
-
X
-
Open fails if file exists (meaningless for /dev/fd/n).
Depends on support by the C library fopen(3) call for
'x' in the mode string.
Added in CSNOBOL4 2.1 where it was ignored for sockets.
In CSNOBOL4 2.2 applies to sockets, and means don't allow local socket address reuse.
-
Z
-
Reserved for .Z (compress(1)) style compression?!
-
z
-
Read and write compressed data in .gz format using zlib(3),
as created by gzip(1). If a digit 0 through 9 immediately follows
the option, it will be interpreted as the compression level to use
when writing. Matches the tar(1) command line option.
Added in CSNOBOL4 2.2.
Other I/O extensions
-
SERV_LISTEN(), SET(), SSET()
-
see snobol4func(1).
I/O Layers
The Macro SNOBOL4 and POSIX I/O architectures
have subtleties which interact, and are explained here:
-
Variable association
-
Input and output is done by reading or writing variables associated
with a unit number for I/O.
-
-
Input (maximum) record lengths are associated each variable association!
-
Unit number
-
Multiple variables can be associated with the same unit number
using the INPUT() and OUTPUT() functions.
-
-
Each unit number refers to a stdio(3) stream
(except on broken systems like Windows, where socket handles
are incompatible with file handles, and all network I/O
is performed “unbuffered”).
-
-
Sequential named files can be associated with an I/O unit
when the -r option is given on the command line!
REWIND() should return to to after the program END label!
-
“Standard I/O” Stream
-
snobol4(1) performs MOST I/O through “Standard Input/Output”
streams. Multiple units can be associated with the same stdio stream
(FILE struct) using magic pathnames
(“-” and /dev/std{in,out,err}).
Buffering is performed by the stdio layer.
-
Operating System file descriptor
-
More than one stdio stream can be associated with the same O/S “fd”
(by opening magic pathname “/dev/fd/n”).
-
-
Each POSIX “fd” has a file position pointer, changed by
reading, writing and the REWIND(), SET() and SSET() functions.
-
-
Normally terminal device “special files” have one set of
mode settings, but CSNOBOL4 associates (saves and restores) different
terminal settings (echo and the number of characters returned on read)
based on fd numbers.
-
Operating System open file object
-
More than one “fd” slot can be associated with the same “open file”
object, either in multiple forks, or by dup(2) of the same fd.
This is often the case for stdin, stdout and stderr.
-
-
Open file objects have flags which effect all associated fds,
including input, output and append modes.
-
Operating System named file
-
Independent opens of the same named “regular” file will have
different open file objects, and thus have independent access modes
and file positions.
-
-
Terminal devices normally have one set of “line discipline”
mode settings, but CSNOBOL4 maintains different settings for each
file descriptor (see above).
BUGS
This page was cut and pasted from various parts of the original
snobol4(1) man page, and still needs review and cleanup.
All extensions should be annotated with the version they appeared in
(and what other implementations they're compatible or inspired by).
Record lengths.
Unit numbers.
SEE ALSO
snobol4(1),
snobol4ezio(3)