UNIX
© Copyright B. Brown, 1988-2000. All rights reserved. March 2000.
Previous PageIndexNext Page

Assignment 5: awk /1


awk is a programming language designed to search for, match patterns, and perform actions on files. awk programs are generally quite small, and are interpreted. This makes it a good language for prototyping.


THE STRUCTURE OF AN AWK PROGRAM

awk scans input lines one after the other, searching each line to see if it matches a set of patterns or conditions specified in the awk program.

For each pattern, an action is specified. The action is performed when the pattern matches that of the input line.

Thus, an awk program consists of a number of patterns and associated actions. Actions are enclosed using curly braces, and separated using semi-colons.


	pattern  { action }
	pattern  { action }


INPUT LINES TO awk

When awk scans an input line, it breaks it down into a number of fields. Fields are separated by a space or tab character. Fields are numbered beginning at one, and the dollar symbol ($) is used to represent a field.

For instance, the following line in a file


	I like money.

has three fields. They are


	$1	I
	$2	like
	$3	money.

Field zero ($0) refers to the entire line.

awk scans lines from a file(s) or standard input.


Your first awk program

Consider the following simple awk program.


	{ print $0 }

There is no pattern to match, only an action expressed. This means that for every line encountered, perform the action.

The action prints field 0 (the entire line).

Using a text editor, create a file called myawk1 and place the above statement in it. Save the file and return to the Unix shell prompt.


Running an awk program

To run the above program, type following command


	awk   -f myawk1  /etc/group

awk interprets the actions specified in the program file myawk1, and applies this to each line read from the file /etc/group. The effect is to print out each input line read from the file, in effect, displaying the file on the screen (same as the Unix command cat).


Searching for a string within an input line

To search for an occurrence of a string in an input line, specify it as a pattern and enclose it using a forward slash symbol. In the example below, it searches each input line for the string brian, and the action prints the entire line.


	/brian/   { print  $0 }

Edit myawk1 and change the search string to your username. Run the program on the files /etc/group and /etc/passwd


	awk  -fmyawk1  /etc/group

	awk  -fmyawk1  /etc/passwd

Compared to the previous example where there was no pattern specified, what is the difference in the output of this program.


	............................................................................................

	............................................................................................

	............................................................................................

Type the following command. This runs the program who and sends its output of who is logged on the system to the awk program which scans each line for the search string. It will thus list out a line containing your login name, terminal number and login date/time.


	who  |  awk -f myawk1

Change the contents of myawk1 to read (replace the search string with your login name)


	/brian/   { print $1, $2 }

What do you expect the output of the program to be? (what fields will it print out?)


	............................................................................................

	............................................................................................

	............................................................................................

Now type the command


	who | awk -f myawk1

What happened? How is the output different than before.


	............................................................................................

	............................................................................................

	............................................................................................


Using awk programs with form files

awk programs are particularly suited to generating reports or forms. In the following examples, we shall use the following textual data as the input file. The file is called awktext. A heading has been provided here for clarity, there is no header in the data file.


	Type	Memory (Kb)	Location	Serial #	HD Size (Mb)
	XT	640		D402		MG0010		0
	386	2048		D403		MG0011		100
	486	4096		D404		MG0012		270
	386	8192		A423		CC0177		400
	486	8192		A424		CC0182		670
	286	4096		A423		CC0183		100
	286	4096		A425		CC0184		80
	Mac	4096		B407		EE1027		80
	Apple	4096		B406		EE1028		40
	68020	2048		B406		EE1029		80
	68030	2048		B410		EE1030		100
	$unix	16636		A405		CC0185		660
	"trs80"	64		Z101		EL0020		0

In addition, all examples (awk program files myawknn) are available.

A public domain MSDOS awk program (awk.exe) is also available.


Simple Pattern Selection

This involves specifying a pattern to match for each input line scanned. The following awk program (myawk2) compares field one ($1) and if the field matches the string "386", the specific action is performed (the entire line is printed).


	$1 == "386"  { print $0 }

Note: The == symbol represents an equality test, thus in the above pattern, it compares the string of field one against the constant string "386", and performs the action if it matches.


	Create the program

		$ cat  -  >   myawk2
		$1 == "386"  { print $0 }
		< ctrl-d> 
		$
		
		Note: < ctrl-d>  is a keypress to terminate input to the shell. Hold 
		down the ctrl key and then press d. User input is shown in bold type.


	Run The Program

		$ awk  -f  myawk2   awktext


	Sample Program Output

		386     2048            D403            MG0011    100
		386     8192            A423            CC0177    400

		The program prints out all input lines where the computer type is a 
		"386".


Write an awk program which prints out all input lines where a computer has 4096 Kb of memory. After running the program successfully, enter it in the space provided below.


	..................................................................................


Using Comments In awk Programs

Comments begin with the hash (#) symbol and continue till the end of the line. The awk program below adds a comment to a previous awk program shown earlier


	#myawk3, same as myawk2 but has a comment in it
	$1 == "386"  { print $0 }

Comments can be placed anywhere on the line. The example below shows the comment placed after the action.


	$1 == "386"  { print $0 }   # print all records where the computer is a 386

Remember that the comment ends at the end of the line. The following program is thus wrong, as the closing brace of the action is treated as part of the comment.


	$1 == "386  { print $0    #print out all records  }


Relational Expressions

We have already seen the equality test. Detailed below are the other relational operators used in comparing expressions.


	< 	less than
	< =	less than or equal to
	==	equal to
	!=	not equal
	> =	greater than or equal to
	> 	greater than
	~	matches
	!~	does not match

Some Examples Of Using Relational Operators


	# myawk4, an awk program to display all input lines for computers 
	# with less than 1024 Kb of memory 
	$2 <  1024  { print $0 }

	myawk4 Program Output
	XT	640		D402		MG0010	0
	"trs80"	64		Z101		EL0020	0

	===================================================================
	# myawk5
	# an awk program to print the location/serial number of 486 computers
	$1 == "486"  { print $3, $4 }

	myawk5 Program Output
	D404  MG0012
	A424  CC0182

	===================================================================
	# myawk6 
	# an awk program to print out all computers belonging to management.
	/MG/  { print $0 }

	myawk6 Program Output
	XT	640		D402		MG0010	0
	386	2048		D403		MG0011	100
	486	4096		D404		MG0012	270

The awk program myawk6 scans each input line searching for the occurrence of the string MG. When found, the action prints out the line. The problem with this is it might be possible for the string MG to occur in another field, but the serial number indicate that it belongs to another department.

What is necessary is a means of matching only a specific field. To apply a search to a specific field, the match (~) symbol is used. The modified awk program shown below searches field 4 for the string MG.


	# myawk6A
	# improved awk program, print out all computers belonging to management.
	$4 ~ /MG/  { print $0 }

	myawk6a Program Output
	XT	640		D402		MG0010	0
	386	2048		D403		MG0011	100
	486	4096		D404		MG0012	270

What do the following examples do?


	$2 != "4096"  { print $0 }

		....................................................................................

		....................................................................................


	$5 >  100    { print $4 }

		....................................................................................

		....................................................................................


	$4 !~ /CC/    { print $0 }

		....................................................................................

		....................................................................................

Write an awk program to display the location of all computers belonging to the computer centre (code CC). Test the program, and after running the program successfully, enter the program in the space provided below.


	..................................................................................


Making the output a bit more meaningful

In all the previous examples, the output of the awk program has been either the entire line or fields within the line. Lets add some text to make the output more meaningful. Consider the following awk program,


	# myawk7
	# list computers located in D block, type and location
	$3 ~ /D/  { print "Location = ", $3, "  type = ", $1 }

	myawk7 Program Output
	Location =   D402  type =   XT
	Location =   D403  type =   386
	Location =   D404  type =   486


Text And Formatted Output Using printf

We shall tidy the output information by using a built in function of awk called printf. C programmers will have no difficulty using this, as it operates the same way as in the C programming language.


Printing A Text String

Lets examine how to print out some simple text. Consider the following statement,


	printf( "Location : " );

The printf statement is terminated by a semi-colon. Brackets are used to enclose the argument, and the text is enclosed using double quotes. Now lets combine it into an actual awk program which displays the location of all 286 type computers.


	#myawk8
	$1 == "286" {  printf( "Location : ");   print $3 }

	myawk8 Program Output
	Location : A423
	Location : A425


Printing A Field Which Is A Text String

Lets now examine how to use printf to display a field which is a text string. In the previous program, a separate statement (print $3) was used to write the room location. In the program below, this will be combined into the printf statement also.


	#myawk9
	$1 == "286"  {  printf( "Location is %s\n", $3 );  }

	myawk9 Program Output
	Location is A423
	Location is A425

Note: The symbol \n causes subsequent output to begin on a new line. The symbol %s informs printf to print out a text string, in this case it is the contents of the field $3.

Consider the following awk program which prints the location and serial number of all 286 computers.


	#myawk10
	$1=="286" { printf( "Location = %s, serial # = %s\n", $3, $4 ); }

	myawk10 Program Output
	Location = A423, serial # = CC0183
	Location = A425, serial # = CC0184

Write an awk program which lists the serial numbers of all computers belonging to the management school. After running the program successfully, enter it in the space provided below.


		...............................................................................


Printing A Numeric Value

Lets now see how to print a numeric value. The symbol %d is used for numeric values. The following awk program lists the location and disk capacity of all 486 computers.


	#myawk11
	$1=="486" { printf("Location = %s, disk = %dKb\n", $3, $5 );  }

	myawk11 Program Output
	Location = D404, disk = 270Kb
	Location = A424, disk = 670Kb

Write an awk program which lists the memory size and serial number of all computers which have a hard disk greater than 80Mb in size. After running the program successfully, enter it in the space provided below.


	...............................................................................


Formatting Output

Lets see how to format the output information into specific field widths. A modifier to the %s symbol specifies the size of the field width, which by default is right justified.


	#myawk12
	# formatting the output using a field width
	$1=="286" {printf("Location = %10s, disk = %5dKb\n",$3,$5);}

	myawk12 Program Output
	Location =       A423, disk =   100Kb
	Location =       A425, disk =     80Kb

10%s specifies to print out field $3 using a field width of 10 characters, and %5d specifies to print out field $5 using a field width of 5 digits.


Summary of printf so far

Below lists the options to printf covered above. [n] indicates optional arguments.


	%[n]s		print a text string
	%[n]d		print a numeric value
	\n		print a new-line


The BEGIN And END Statements Of An awk Program

The keywords BEGIN and END are used to perform specific actions relative to the programs execution.


	BEGIN	The action associated with this keyword is executed before the
		first input line is read.

	END	The action associated with this keyword is executed after all
		input lines have been processed.

The BEGIN keyword is normally associated with printing titles and setting default values, whilst the END keyword is normally associated with printing totals.

Consider the following awk program, which uses BEGIN to print a title.


	#myawk13
	BEGIN   { print "Location of 286 Computers" }
	$1 == "286"  { print $3 }

	myawk13 Program Output
	Location of 286 Computers
	A423
	A425


Introducing awk Defined Variables

awk programs support a number of pre-defined variables.


	NR	the current input line number
	NF	number of fields in the input line


	#myawk14
	# print the number of computers
	END	{ print "There are ", NR, "computers" }

	myawk14 Program Output
	There are  13 computers


User Defined Variables In An awk Program

awk programs support the use of variables. Consider an example where we want to count the number of 486 computers we have. Variables are explicitly initialised to zero by awk, so there is no need to assign a value of zero to them.

The following awk program counts the number of 486 computers, and uses the END keyword to print out the total after all input lines have been processed. When each input line is read, field one is checked to see if it matches 486. If it does, the awk variable computers is incremented (the symbol ++ means increment by one).


	#myawk15
	$1 == "486"  { computers++ }
	END	{  printf("The number of 486 computers is %d\n", computers);  }

	myawk15 Program Output
	The number of 486 computers is 2

	Note: There is no need to explicitly initialise the variable 'computers' to 
	zero. awk does this by default.

Write an awk program which counts the number of computers which have 8192Kb or greater amounts of memory, then prints the number found at the end of the program. After running the program successfully, enter it in the space provided below.


		...............................................................................

		...............................................................................

Write an awk program which sums the disk space of all computers, then prints the total disk space at the end of the program. After running the program successfully, enter it in the space provided below.


		...............................................................................

		...............................................................................


Previous PageIndexNext Page
Home | Other Courses | Assessments | Notes | Tests
© Copyright Brian Brown, 1988-2000. All rights reserved.