本文共 21917 字,大约阅读时间需要 73 分钟。
${var:-bar}() | ${var:=bar} | |||||
Those notes are partially based on lecture notes by Professor Nikolai Bezroukov at FDU.
String operators allow you to manipulate the contents of a variable without resorting to AWK or Perl. Modern shells such as bash 3.x or ksh93 supports most of the standard string manipulation functions, but in a very pervert, idiosyncratic way. Anyway, standard functions like ,, are available. Strings can be concatenated by juxtaposition and using double quoted strings. You can ensure that variables exist (i.e., are defined and have non-null values) and set default values for variables and catch errors that result from variables not being set. You can also perform basic pattern matching. There are several basic string operations available in bash, ksh93 and similar shells:
String operators in shell use unique among programming languages curly-bracket syntax. In shell any variable can be displayed as ${ name_of_the_variable} instead of ${ name_of_the_variable}. This notation most often is used to protect a variable name from merging with string that comes after it. Here is example in which it is used for separation of a variable $var and a string "_string"
$ export var='test' $ echo ${var}_string # var is a variable that uses syntax ${var} and its value will be substitutedtest_string$ echo $var_string # var_string is a variable that doesn't exist, so echo doesn't print anything
In Korn 88 shell this notation was extended to allow expressions inside curvy brackets. For example ${var=moo}. Each operation is encoded using special symbol or two symbols ("digram", for example :-, :=, etc) . An argument that the operator may need is positioned after the symbol of the operation. And later this notation extended ksh93 and adopted by bash and other shells.
This "ksh-originated" group of operators is the most popular and probably the most widely used group of string-handling operators so it makes sense to learn them, if only in order to be able to modify old scripts. Bash 3.2 and later has =~ operator with "normal" Perl-style regular expressions that can be used instead in many cases and they are definitely preferable in new scripts that you might write. Let's say we need to establish whether variable $x appears to be a social security number:
if [[ $x =~ [0-9]{3}-[0-9]{2}-[0-9]{4} ]]then # process SSNelse # print error messagefi
Those operators can test for the existence of variables and allows substitutions of default values under certain conditions.
Note: The colon (:) in each of these operators is actually optional. If the colon is omitted, then change "exists and isn't null" to "exists" in each definition, i.e., the operator tests for existence only.
Bash and ksh also provide some (limited) regular expression functionality called pattern matching operators
Introduced in ksh88 notation was and still it really very idiosyncratic. In examples below we will assume that the variable var has value "this is a test" (as produced by execution of statement export var="this is a test")
echo ${var#t*is}is a test
echo ${var##t*is}a test
echo ${var%t*st}this is a
echo ${var%%t*st} # returns empty string as the first word is matched
Despite shell deficiencies in this area and idiosyncrasies preserved from 1970th most classic string operations can be implemented in shell. You can define functions that behave almost exactly like in Perl or other "more normal" language. In case shell facilities are not enough you can use or Perl. It's actually sad that AWK was not integrated into shell.
There are several ways to get length of the string.
The simplest one is ${#varname}, which returns the length of the value of the variable as a character string. For example, if filename has the value fred.c, then ${#filename} would have the value 6.
The second is to use built in function expr, for example
expr length $stringor
expr "$string" : '.*'
stringZ=abcABC123ABCabcecho ${#stringZ} # 15echo `expr length $stringZ` # 15echo `expr "$stringZ" : '.*'` # 15
check_length() # check_length # to call: check_length string max_length_of_string { # check we have the right params if (( $# != 2 )) ; then echo "check_length need two parameters: a string and max_length" return 1 fi if (( ${#1} > $2 )) ; then return 1 fi return 0 }
You could call the function check_length like this:
#!/usr/bin/bash# test_name while : do echo -n "Enter customer name :" read NAME [ check_length $NAME 10 ] && break echo "The string $NAME is longer then 10 characters" done
echo $NAME
expr match "$string" '$substring'
where:
my_regex=abcABC123ABCabc# |------|echo `expr match "$my_regex" 'abc[A-Z]*.2'` # 8echo `expr "$my_regex" : 'abc[A-Z]*.2'` # 8
Function index return the position of substring in string counting from one and 0 if substring is not found.
expr index $string $substringNumerical position in $string of first character in $substring that matches.
stringZ=abcABC123ABCabcecho `expr index "$stringZ" C12` # 6 # C position.echo `expr index "$stringZ" c` # 3# 'c' (in #3 position)
This is the close equivalent of strchr() in C.
Substring function is available as a part of pattern matching operators in shell and has the form ${param:offset[:length}.
If an `offset' evaluates to a number less than zero, it counts back from the end of the string defined by variable$param.
Notes:
a=12345678echo ${a:-4}intending to print the last four characters of $a. The problem is that ${param:-word} already has a special meaning: in shell: assigning the value after minus sign to the variable, if the value of variable param is undefined or null. To use negative offsets that begin with a minus sign, separate the minus sign and the colon with a space.
${string:position}Extracts substring from $string at $position.
If the $string parameter is "*" or "@", then this extracts the, starting at $position.
${string:position:length}Extracts $length characters of substring from $string at $position.
stringZ=abcABC123ABCabc# 0123456789.....# 0-based indexing.echo ${stringZ:0} # abcABC123ABCabcecho ${stringZ:1} # bcABC123ABCabcecho ${stringZ:7} # 23ABCabcecho ${stringZ:7:3} # 23A # Three characters of substring.
If the $string parameter is "*" or "@", then this extracts a maximum of $length positional parameters, starting at $position.
echo ${*:2} # Echoes second and following positional parameters.echo ${@:2} # Same as above.echo ${*:2:3} # Echoes three positional parameters, starting at second.
expr substr $string $position $lengthExtracts $length characters from $string starting at $position..
The first character has index one.
stringZ=abcABC123ABCabc# 123456789......# 1-based indexing.echo `expr substr $stringZ 1 2` # abecho `expr substr $stringZ 4 3` # ABC
You can search and replace substring in a variable using ksh syntax:
alpha='This is a test string in which the word "test" is replaced.' beta="${alpha/test/replace}"
The string "beta" now contains an edited version of the original string in which the first case of the word "test" has been replaced by "replace". To replace all cases, not just the first, use this syntax:
beta="${alpha//test/replace}"
Note the double "//" symbol.
Here is an example in which we replace one string with another in a multi-line block of text:
list="cricket frog cat dog" poem="I wanna be a x\n\ A x is what I'd love to be\n\ If I became a x\n\ How happy I would be.\n"for critter in $list; do echo -e ${poem//x/$critter}done
Strings can be concatenated by juxtaposition and using double quoted strings. For example
PATH="$PATH:/usr/games"
Double quoted string in shell is almost identical to double quoted string in Perl and performs macro expansion of all variables in it. The minor difference is the treatment of escaped characters. If you want exact match you can use $'string'
#!/bin/bash# String expansion.Introduced with version 2 of Bash.# Strings of the form $'xxx' have the standard escaped characters interpreted. echo $'Ringing bell 3 times \a \a \a' # May only ring once with certain terminals.echo $'Three form feeds \f \f \f'echo $'10 newlines \n\n\n\n\n\n\n\n\n\n'echo $'\102\141\163\150' # Bash # Octal equivalent of characters.exit 0
In bash-3.1, a string append operator (+=) was added:
PATH+=":~/bin"echo "$PATH"
Using the wildcard character (?), you can imitate Perl chop function (which cuts the last character of the string and returns the rest) quite easily
test="~/bin/"trimmed_last=${test%?}trimmed_first=${test#?}echo "original='$test,timmed_first='$trimmed_first', trimmed_last='$trimmed_last'"
The first character of a string can also be obtained with printf:
printf -v char "%c" "$source"Conditional chopping line in Perl chomp function or REXX function trim can be done using while loop, for example:
function trim{ target=$1 while : # this is an infinite loop do case $target in ' '*) target=${target#?} ;; ## if $target begins with a space remove it *' ') target=${target%?} ;; ## if $target ends with a space remove it *) break ;; # no more leading or trailing spaces, so exit the loop esac done return target}
A more Perl-style method to trim trailing blanks would be
spaces=${source_var##*[! ]} ## get the trailing blanks in var $spaces
trimmed_var=${source_var#$spaces}The same trick can be used for removing leading spaces.
Operator: ${var:-bar} is useful for assigning a variable a default value. It word the following way: if $var exists and is not null, return $var. If it doesn't exist or is null, return bar.
Example:
$ export var=""$ echo ${var:-one}one$ echo $var
More complex example:
sort -nr $1 | head -${2:-10}
A typical usage include situations when you need to check if arguments were passed to the script and if not assign some default values::
#!/bin/bash export FROM=${1:-"~root/.profile"}export TO=${2:-"~my/.profile"}cp -p $FROM $TO
Additional modification allows to set variable if it is not defined. This is done with the operator ${var:=bar}
It works as following: If $var exists and is not null, return $var. If it doesn't exist or is null, set $var to bar and return bar.
Example:
$ export var=""$ echo ${var:=one}oneResults:
$ echo $varone
There are two types of pattern matching is shell:
Unless you need to modify old scripts it does not make sense to use old ksh-style regex in bash.
(partially borrowed from)
Since version 3 of bash (released in 2004) bash implements an extended regular expressions which are mostly compatible with Perl regex. They are also called POSIX regular expressions as they are defined in. (which you should read and understand to use the full power provided). Extended regular expression are also used in egrep so they are mostly well known by system administrators. Please note that Perl regular expressions are equivalent to extended regular expressions with a few additional features:Extended regular expression support set of predefined character classes. When used between brackets, these define commonly used sets of characters. The POSIX character classes implemented in extended regular expressions include:
Modifies are by and large similar to Perl
Extended regex | Perl regex |
a+ | a+ |
a? | a? |
a|b | a|b |
(expression1) | (expression1) |
{m,n} | {m,n} |
{,n} | {,n} |
{m,} | {m,} |
{m} | {m} |
It returns 0 (success) if the regular expression matches the string, otherwise it returns 1 (failure).
In addition to doing simple matching, bash regular expressions support sub-patterns surrounded by parenthesis for capturing parts of the match. The matches are assigned to an array variable BASH_REMATCH. The entire match is assigned to BASH_REMATCH[0], the first sub-pattern is assigned to BASH_REMATCH[1], etc..
The following example script takes a regular expression as its first argument and one or more strings to match against. It then cycles through the strings and outputs the results of the match process:
#!/bin.bashif [[ $# -lt 2 ]]; then echo "Usage: $0 PATTERN STRINGS..." exit 1firegex=$1shiftecho "regex: $regex"echowhile [[ $1 ]]do if [[ $1 =~ $regex ]]; then echo "$1 matches" i=1 n=${#BASH_REMATCH[*]} while [[ $i -lt $n ]] do echo " capture[$i]: ${BASH_REMATCH[$i]}" let i++ done else echo "$1 does not match" fi shiftdone
Assuming the script is saved in "bashre.sh", the following sample shows its output:
# sh bashre.sh 'aa(b{2,3}[xyz])cc' aabbxcc aabbcc regex: aa(b{2,3}[xyz])cc aabbxcc matches capture[1]: bbx aabbcc does not match
Pattern-matching operators were introduced in ksh88 in a very idiosyncratic way. The notation is different from used by Perl or utilities such as grep. That's a shame, but that's how it is. Life is not perfect. They are hard to remember, but there is a handy mnemonic tip: # matches the front because number signsprecede numbers; % matches the rear because percent signs follow numbers.
There are two kinds of pattern matching available: matching from the left and matching from the right.
The operators, with their functions and an example, are shown in the following table:
Operator | Meaning | Example |
${var#t*is} | Deletes the shortest possible match from the left: If the pattern matches the beginning of the variable's value, delete the shortest part that matches and return the rest. | export $var="this is a test" echo ${var#t*is} is a test |
${var##t*is} | Deletes the longest possible match from the left: If the pattern matches the beginning of the variable's value, delete the longest part that matches and return the rest. | export $var="this is a test" echo ${var##t*is} a test |
${var%t*st} | Deletes the shortest possible match from the right: If the pattern matches the end of the variable's value, delete the shortest part that matches and return the rest. | export $var="this is a test" echo ${var%t*st} this is a |
${var%%t*st} | Deletes the longest possible match from the right: If the pattern matches the end of the variable's value, delete the longest part that matches and return the rest. | export $var="this is a test" echo ${var%%t*is} |
While the # and % identifiers may not seem obvious, they have a convenient mnemonic. The # key is on the left side of the $ key on the keyboard and operates from the left. The % key is on the right of the $ key and operated from the right.
These operators can be used to do a variety of things. For example, the following script changes the extension of all .html files to .htm.
#!/bin/bash# quickly convert html filenames for use on a dossy system# only handles file extensions, not filenamesfor i in *.html; do if [ -f ${i%l} ]; then echo ${i%l} already exists else mv $i ${i%l} fidone
The classic use for pattern-matching operators is stripping off components of pathnames, such as directory prefixes and filename suffixes. With that in mind, here is an example that shows how all of the operators work. Assume that the variablepath has the value /home /billr/mem/long.file.name; then:
Expression Result${path##/*/} long.file.name${path#/*/} billr/mem/long.file.name$path /home/billr/mem/long.file.name${path%.*} /home/billr/mem/long.file${path%%.*} /home/billr/mem/long
Example:
$ export var="this is a test"$ echo ${var#t*is}is a test
Example:
$ export var="this is a test"$ echo ${var##t*is}a test
Example:
$ export var="this is a test" $ echo ${var%t*st} this is a
for i in *.htm*; do if [ -f ${i%l} ]; then echo "${i%l} already exists" else mv $i ${i%l} fi done
Example:
$ export var="this is a test" $ echo ${var%%t*st}
A shell regular expression can contain regular characters, standard wildcard characters, and additional operators that are more powerful than wildcards. Each such operator has the form x(exp), where x is the particular operator and exp is any regular expression (often simply a regular string). The operator determines how many occurrences of exp a string that matches the pattern can contain.
Operator | Meaning |
---|---|
*(exp) | 0 or more occurrences of exp |
+(exp) | 1 or more occurrences of exp |
?(exp) | 0 or 1 occurrences of exp |
@(exp1|exp2|...) | exp1 or exp2 or... |
!(exp) | Anything that doesn't match exp |
Expression | Matches |
---|---|
x | x |
*(x) | Null string, x, xx, xxx, ... |
+(x) | x, xx, xxx, ... |
?(x) | Null string, x |
!(x) | Any string except x |
@(x) | x (see below) |
The following section compares Korn shell regular expressions to analogous features in awk and egrep. If you aren't familiar with these, skip to the section entitled "Pattern-matching Operators."
Shell | egrep/awk | Meaning |
---|---|---|
*(exp) | exp* | 0 or more occurrences of exp |
+(exp) | exp+ | 1 or more occurrences of exp |
?(exp) | exp? | 0 or 1 occurrences of exp |
@(exp1|exp2|...) | exp1|exp2|... | exp1 or exp2 or... |
!(exp) | (none) | Anything that doesn't match exp |
These equivalents are close but not quite exact. Actually, an exp within any of the Korn shell operators can be a series of exp1|exp2|... alternates. But because the shell would interpret an expression like dave|fred|bob as a pipeline of commands, you must use @(dave|fred|bob) for alternates
For example:
It is worth re-emphasizing that shell regular expressions can still contain standard shell wildcards. Thus, the shell wildcard ? (match any single character) is the equivalent to . in egrep or awk, and the shell's character set operator [...] is the same as in those utilities. For example, the expression +([0-9]) matches a number, i.e., one or more digits. The shell wildcard character * is equivalent to the shell regular expression * (?).
A few egrep and awk regexp operators do not have equivalents in the Korn shell. These include:
The first two pairs are hardly necessary, since the Korn shell doesn't normally operate on text files and does parse strings into words itself.
FROM: http://www.softpanorama.org/Scripting/Shellorama/Reference/string_operations_in_shell.shtml
转载地址:http://egvai.baihongyu.com/