Some thoughts on a new shell grammar, because this is a convenient spot :)
Basics
- Parsing order is: Weird splitting, expansion L-R, redirect processing, execution.
- Variables are scoped; scopes nest.
- No implicit subshells. Probably requires MT impl, but that's OK.
Word splitting
Words in command lines ("outer expressions") are:
- Barewords: command names, globs, switches, parameters, ...
- Literal globs (below)
- Quoted strings
- Outer operators: =, redirections
- Parenthesized expressions
- Variable references, e.g.,
$foo or @bar[0]
- Separators and delimiters: semi, braces, && || &, ternary ?? ::
- Keywords: for in do while until loop next function if elif end select case time
Command substitution
- Command substitution:
$(outer_expression) because it's familiar
Inner expressions
Expressions in parens are "inner" expressions. They are where substitutions happen, except that variable expansion can also happen in outer expressions.
Operators are C, plus
:- := :? :+ from bash - take action if the lhs is unset or empty.
// //= //? //+ Analogous, but Defined-or like Perl - take action if the lhs is unset
- ?: Elvis - test truth of the lhs.
- So
$foo // "hi" is hi unless $foo is defined, and $foo ?: "Hi" is Hi if $foo is either undefined or empty, since empty strings are considered false.
- ~ regex match (or glob), or sub returning new value
- =~ regex sub in place
- -~ and -~= regex or glob substring removal - suffix (in place with =)
- ~- and =~- likewise, but prefixes.
- Pattern is always to the left of the string for prefix removal and to the right of the string for suffix removal. For replacements, anchor regexes or use qg.
(inner expression not starting with !)
Data types
- Scalars
- Text, number, file name, file descriptor, boolean, glob, regex
- Expansion within "" as usual, but the result becomes one word.
- No expansion in ''; only \' and
.
- qr// for regex literals
- qs/// for regex subs
- qg// for glob literals. In a qg, ^ and $ work at the start and end of globs.
- qt for transliteration:
qt/// for specified charsets, qt^ and ^^ , ,, ~ ~~ for case changes
- Bools: true and false. True is 0; false is any nonzero, in both outer and inner expressions. Strings are true iff non-empty. Undefined vars are false in a bool context.
- Arrays - only numeric indices
- Hashes - only text indices. Numbers are converted to string keys per convfmt or something similar.
Sigils are $ for scalars, @ for arrays, and % for hashes. The sigil used is that of the variable, so %foo[bar]. Indexing is always []. Braces can be used after the sigil to disambiguate.
You can't have both $foo and %foo. Only one type per name!
Array or hash elements can be any scalar type; containers can include values of different types. TODO allow nested containers?
Sigils are used with var references for both lvalue and rvalue, so $foo=42, not foo=42.
Contexts
The result of a term depends on its context. E.g. echo $(foo) prints the standard output of foo, but if $(foo) tests whether foo succeeded.
Contexts are the same as the scalar types.
Casting/accessors
<selector>`<expr> returns the result of <expr> indicated by <selector>. Multiple selectors can be given, separated by `. Selectors are:
? the exit status of a program
- A non-negative integer: that file descriptor.
To-do specify whether you want name or pipe of a descriptor
- Redirection
<, >, |, &> work as usual. They are shorthand for a more general mechanism: [fetch] cmd [stash] |; [fetch] cmd [stash] ... |; ... |; is a "Mack" because it can carry a whole lot of data and is larger than a regular semi ;) . A fetch or stash is an expression involving the -> operator.
Stashes are:
->"foo" or ->'foo': output to file foo. Quote processing is as usual. The quote are required (to-do relax this?). By default, stdout is saved.
->&bar: Stash all selectors into special variable &bar. In this form, &bar only exists in the pipeline.
Either of these can be preceded by a selector. E.g., 2->"foo.txt" saves stderr to foo.txt.
- Selector
->$bat: save "selector" to variable $bat, which does last outside the pipeline. The selector is required.