\ 取消字符的特殊含义
^ 在行首匹配。^不能用于匹配嵌套在一个字符串中的行首,if (“line1\nLINE 2” ~ /^L/) …不为真。
在行尾匹配。不能用于匹配嵌套在一个字符串中的行尾,if (“line1\nLINE 2” ~ /1$/) …不为真
. 匹配单个任意字符,包括换行符。
[…] 匹配制定字符组中的任意一个。
[^ …] 匹配任何一个不在制定字符组中的字符
| 匹配|两侧的任意的字符(组),在所有的正则表达式中优先级最低。The alternation applies to the
largest possible regexps on either side.
(…) Parentheses are used for grouping in regular expressions, as in arithmetic. They can
be used to concatenate regular expressions containing the alternation operator.
* 匹配零个或者多个前导字符
+ 匹配一个或者多个前导字符
? 匹配零个或者多个前导字符
{n} ,{n,} ,{n,m} One or two numbers inside braces denote an interval expression. If there is
one number in the braces, the preceding regexp is repeated n times. If there are two
numbers separated by a comma, the preceding regexp is repeated n to m times. If there
is one number followed by a comma, then the preceding regexp is repeated at least n
Matches ‘whhhy’, but not ‘why’ or ‘whhhhy’.
Matches ‘whhhy’, ‘whhhhy’, or ‘whhhhhy’, only.
Matches ‘whhy’ or ‘whhhy’, and so on.
Interval expressions were not traditionally available in awk. They were added as part
of the POSIX standard to make awk and egrep consistent with each other.
However, because old programs may use ‘{’ and ‘}’ in regexp constants, by default
gawk does not match interval expressions in regexps. If either –posix or –re-interval are
specified, then interval expressions are allowed in regexps.
For new programs that use ‘{’ and ‘}’ in regexp constants, it is good practice to
always escape them with a backslash. Then the regexp constants are valid and work the way
you want them to.
正则表达式中 ‘*’,,‘+’, ‘?’以及‘{’ 和 ‘}’有最高的优先级,解析来是连接操作符,最后是‘|’. 算术中一
在POSIX awk和gawk中,如果正则表达式里’*’,’+’,’?’前面没有任何字符,那么这三个字符代表他们自己。
gawk-Specific Regexp Operators
\Y 匹配一个单词开头或者末尾的空字符串。
\B 匹配单词内的空字符串。
\< 匹配一个单词的开头的空字符串,锚定开始。 \> 匹配一个单词的末尾的空字符串,锚定末尾。
\w 匹配一个字母数字组成的单词。
\W 匹配一个非字母数字组成的单词。
\‘ 匹配字符串开头的一个空字符串。
\’ 匹配字符串末尾的一个空字符串。
The various command-line options control how gawk interprets characters in regexps:
Nooptions :
In the default case, gawk provides all the facilities of POSIX regexps and the
previously described GNU regexp operators. GNU regexp operators described in Regexp
Operators. However, interval expressions are not supported.
–posix :
Only POSIX regexps are supported; the GNU operators are not special (e.g., ‘\w’
matches a literal ‘w’). Interval expressions are allowed.
–traditional :
Traditional Unix awk regexps are matched. The GNU operators are not special, interval
expressions are not available, nor are the POSIX character classes ([[:alnum:]], etc.).
Characters described by octal and hexadecimal escape sequences are treated literally, even
if they represent regexp metacharacters. Also, gawk silently skips directories named on the
command line.
–re-interval :
Allow interval expressions in regexps, even if –traditional has been provided. (–
posix automatically enables interval expressions, so –re-interval is redundant when –posix
is is used.)
Class Meaning
[:alnum:] Alphanumeric characters.
[:alpha:] Alphabetic characters.
[:blank:] Space and TAB characters.
[:cntrl:] Control characters.
[:digit:] Numeric characters.
[:graph:] Characters that are both printable and visible.
[:lower:] Lowercase alphabetic characters.
[:print:] Printable characters (characters that are not control characters).
[:punct:] Punctuation characters
[:space:] Space characters (such as space, TAB, and formfeed, to name a few).
[:upper:] Uppercase alphabetic characters.
[:xdigit:] Characters that are hexadecimal digits.

范围模板匹配从第一个模板的第一次出现到第二个模板的第一次出现之间所有行。如果有一个模板没 出现,则匹配
到开头或末尾。如$ awk ‘/root/,/mysql/’ test将显示root第一次出现到mysql第 一次出现之间的所有行。


Captcha Code