UNIX
© Copyright B. Brown, 1988-2000. All rights reserved. March
2000.
![]()
![]()
![]()
Assignment 5: awk /1
awk is a programming language designed to search for, match patterns, and perform actions on files. awk programs are generally quite small, and are interpreted. This makes it a good language for prototyping.
awk scans input lines one after the other, searching each line to see if it matches a set of patterns or conditions specified in the awk program.
For each pattern, an action is specified. The action is performed when the pattern matches that of the input line.
Thus, an awk program consists of a number of patterns and associated actions. Actions are enclosed using curly braces, and separated using semi-colons.
pattern { action }
pattern { action }
When awk scans an input line, it breaks it down into a number of fields. Fields are separated by a space or tab character. Fields are numbered beginning at one, and the dollar symbol ($) is used to represent a field.
For instance, the following line in a file
I like money.
has three fields. They are
$1 I $2 like $3 money.
Field zero ($0) refers to the entire line.
awk scans lines from a file(s) or standard input.
Consider the following simple awk program.
{ print $0 }
There is no pattern to match, only an action expressed. This means that for every line encountered, perform the action.
The action prints field 0 (the entire line).
Using a text editor, create a file called myawk1 and place the above statement in it. Save the file and return to the Unix shell prompt.
To run the above program, type following command
awk -f myawk1 /etc/group
awk interprets the actions specified in the program file myawk1, and applies this to each line read from the file /etc/group. The effect is to print out each input line read from the file, in effect, displaying the file on the screen (same as the Unix command cat).
To search for an occurrence of a string in an input line, specify it as a pattern and enclose it using a forward slash symbol. In the example below, it searches each input line for the string brian, and the action prints the entire line.
/brian/ { print $0 }
Edit myawk1 and change the search string to your username. Run the program on the files /etc/group and /etc/passwd
awk -fmyawk1 /etc/group awk -fmyawk1 /etc/passwd
Compared to the previous example where there was no pattern specified, what is the difference in the output of this program.
............................................................................................ ............................................................................................ ............................................................................................
Type the following command. This runs the program who and sends its output of who is logged on the system to the awk program which scans each line for the search string. It will thus list out a line containing your login name, terminal number and login date/time.
who | awk -f myawk1
Change the contents of myawk1 to read (replace the search string with your login name)
/brian/ { print $1, $2 }
What do you expect the output of the program to be? (what fields will it print out?)
............................................................................................ ............................................................................................ ............................................................................................
Now type the command
who | awk -f myawk1
What happened? How is the output different than before.
............................................................................................ ............................................................................................ ............................................................................................
awk programs are particularly suited to generating reports or forms. In the following examples, we shall use the following textual data as the input file. The file is called awktext. A heading has been provided here for clarity, there is no header in the data file.
Type Memory (Kb) Location Serial # HD Size (Mb) XT 640 D402 MG0010 0 386 2048 D403 MG0011 100 486 4096 D404 MG0012 270 386 8192 A423 CC0177 400 486 8192 A424 CC0182 670 286 4096 A423 CC0183 100 286 4096 A425 CC0184 80 Mac 4096 B407 EE1027 80 Apple 4096 B406 EE1028 40 68020 2048 B406 EE1029 80 68030 2048 B410 EE1030 100 $unix 16636 A405 CC0185 660 "trs80" 64 Z101 EL0020 0
In addition, all examples (awk program files myawknn) are available.
A public domain MSDOS awk program (awk.exe) is also available.
This involves specifying a pattern to match for each input line scanned. The following awk program (myawk2) compares field one ($1) and if the field matches the string "386", the specific action is performed (the entire line is printed).
$1 == "386" { print $0 }
Note: The == symbol represents an equality test, thus in the above pattern, it compares the string of field one against the constant string "386", and performs the action if it matches.
Create the program
$ cat - > myawk2
$1 == "386" { print $0 }
< ctrl-d>
$
Note: < ctrl-d> is a keypress to terminate input to the shell. Hold
down the ctrl key and then press d. User input is shown in bold type.
Run The Program
$ awk -f myawk2 awktext
Sample Program Output
386 2048 D403 MG0011 100
386 8192 A423 CC0177 400
The program prints out all input lines where the computer type is a
"386".
Write an awk program which prints out all input lines where a computer has 4096 Kb of memory. After running the program successfully, enter it in the space provided below.
..................................................................................
Comments begin with the hash (#) symbol and continue till the end of the line. The awk program below adds a comment to a previous awk program shown earlier
#myawk3, same as myawk2 but has a comment in it
$1 == "386" { print $0 }
Comments can be placed anywhere on the line. The example below shows the comment placed after the action.
$1 == "386" { print $0 } # print all records where the computer is a 386
Remember that the comment ends at the end of the line. The following program is thus wrong, as the closing brace of the action is treated as part of the comment.
$1 == "386 { print $0 #print out all records }
We have already seen the equality test. Detailed below are the other relational operators used in comparing expressions.
< less than < = less than or equal to == equal to != not equal > = greater than or equal to > greater than ~ matches !~ does not match
Some Examples Of Using Relational Operators
# myawk4, an awk program to display all input lines for computers
# with less than 1024 Kb of memory
$2 < 1024 { print $0 }
myawk4 Program Output
XT 640 D402 MG0010 0
"trs80" 64 Z101 EL0020 0
===================================================================
# myawk5
# an awk program to print the location/serial number of 486 computers
$1 == "486" { print $3, $4 }
myawk5 Program Output
D404 MG0012
A424 CC0182
===================================================================
# myawk6
# an awk program to print out all computers belonging to management.
/MG/ { print $0 }
myawk6 Program Output
XT 640 D402 MG0010 0
386 2048 D403 MG0011 100
486 4096 D404 MG0012 270
The awk program myawk6 scans each input line searching for the occurrence of the string MG. When found, the action prints out the line. The problem with this is it might be possible for the string MG to occur in another field, but the serial number indicate that it belongs to another department.
What is necessary is a means of matching only a specific field. To apply a search to a specific field, the match (~) symbol is used. The modified awk program shown below searches field 4 for the string MG.
# myawk6A
# improved awk program, print out all computers belonging to management.
$4 ~ /MG/ { print $0 }
myawk6a Program Output
XT 640 D402 MG0010 0
386 2048 D403 MG0011 100
486 4096 D404 MG0012 270
What do the following examples do?
$2 != "4096" { print $0 }
....................................................................................
....................................................................................
$5 > 100 { print $4 }
....................................................................................
....................................................................................
$4 !~ /CC/ { print $0 }
....................................................................................
....................................................................................
Write an awk program to display the location of all computers belonging to the computer centre (code CC). Test the program, and after running the program successfully, enter the program in the space provided below.
..................................................................................
In all the previous examples, the output of the awk program has been either the entire line or fields within the line. Lets add some text to make the output more meaningful. Consider the following awk program,
# myawk7
# list computers located in D block, type and location
$3 ~ /D/ { print "Location = ", $3, " type = ", $1 }
myawk7 Program Output
Location = D402 type = XT
Location = D403 type = 386
Location = D404 type = 486
We shall tidy the output information by using a built in function of awk called printf. C programmers will have no difficulty using this, as it operates the same way as in the C programming language.
Lets examine how to print out some simple text. Consider the following statement,
printf( "Location : " );
The printf statement is terminated by a semi-colon. Brackets are used to enclose the argument, and the text is enclosed using double quotes. Now lets combine it into an actual awk program which displays the location of all 286 type computers.
#myawk8
$1 == "286" { printf( "Location : "); print $3 }
myawk8 Program Output
Location : A423
Location : A425
Lets now examine how to use printf to display a field which is a text string. In the previous program, a separate statement (print $3) was used to write the room location. In the program below, this will be combined into the printf statement also.
#myawk9
$1 == "286" { printf( "Location is %s\n", $3 ); }
myawk9 Program Output
Location is A423
Location is A425
Note: The symbol \n causes subsequent output to begin on a new line. The symbol %s informs printf to print out a text string, in this case it is the contents of the field $3.
Consider the following awk program which prints the location and serial number of all 286 computers.
#myawk10
$1=="286" { printf( "Location = %s, serial # = %s\n", $3, $4 ); }
myawk10 Program Output
Location = A423, serial # = CC0183
Location = A425, serial # = CC0184
Write an awk program which lists the serial numbers of all computers belonging to the management school. After running the program successfully, enter it in the space provided below.
...............................................................................
Lets now see how to print a numeric value. The symbol %d is used for numeric values. The following awk program lists the location and disk capacity of all 486 computers.
#myawk11
$1=="486" { printf("Location = %s, disk = %dKb\n", $3, $5 ); }
myawk11 Program Output
Location = D404, disk = 270Kb
Location = A424, disk = 670Kb
Write an awk program which lists the memory size and serial number of all computers which have a hard disk greater than 80Mb in size. After running the program successfully, enter it in the space provided below.
...............................................................................
Lets see how to format the output information into specific field widths. A modifier to the %s symbol specifies the size of the field width, which by default is right justified.
#myawk12
# formatting the output using a field width
$1=="286" {printf("Location = %10s, disk = %5dKb\n",$3,$5);}
myawk12 Program Output
Location = A423, disk = 100Kb
Location = A425, disk = 80Kb
10%s specifies to print out field $3 using a field width of 10 characters, and %5d specifies to print out field $5 using a field width of 5 digits.
Below lists the options to printf covered above. [n] indicates optional arguments.
%[n]s print a text string %[n]d print a numeric value \n print a new-line
The keywords BEGIN and END are used to perform specific actions relative to the programs execution.
BEGIN The action associated with this keyword is executed before the first input line is read. END The action associated with this keyword is executed after all input lines have been processed.
The BEGIN keyword is normally associated with printing titles and setting default values, whilst the END keyword is normally associated with printing totals.
Consider the following awk program, which uses BEGIN to print a title.
#myawk13
BEGIN { print "Location of 286 Computers" }
$1 == "286" { print $3 }
myawk13 Program Output
Location of 286 Computers
A423
A425
awk programs support a number of pre-defined variables.
NR the current input line number
NF number of fields in the input line
#myawk14
# print the number of computers
END { print "There are ", NR, "computers" }
myawk14 Program Output
There are 13 computers
awk programs support the use of variables. Consider an example where we want to count the number of 486 computers we have. Variables are explicitly initialised to zero by awk, so there is no need to assign a value of zero to them.
The following awk program counts the number of 486 computers, and uses the END keyword to print out the total after all input lines have been processed. When each input line is read, field one is checked to see if it matches 486. If it does, the awk variable computers is incremented (the symbol ++ means increment by one).
#myawk15
$1 == "486" { computers++ }
END { printf("The number of 486 computers is %d\n", computers); }
myawk15 Program Output
The number of 486 computers is 2
Note: There is no need to explicitly initialise the variable 'computers' to
zero. awk does this by default.
Write an awk program which counts the number of computers which have 8192Kb or greater amounts of memory, then prints the number found at the end of the program. After running the program successfully, enter it in the space provided below.
............................................................................... ...............................................................................
Write an awk program which sums the disk space of all computers, then prints the total disk space at the end of the program. After running the program successfully, enter it in the space provided below.
............................................................................... ...............................................................................
![]()
![]()
![]()
Home | Other Courses | Assessments | Notes | Tests
© Copyright Brian Brown, 1988-2000. All rights reserved.