AWK – For Beginners

Introduction

Awk or gawk is an extremely powerful text processing programming language. This utility can be found in Linux by default where it is also known as GNU awk or gawk. The word awk is actually derived from the names of its three developers Alfred Aho, Peter Weinberger, and Brian Kernighan. One of its developers  Alfred V. Aho explains awk in simple words written below.

Definition

AWK is a language for processing text files. A file is treated as a sequence of records, and by default each line is a record. Each line is broken up into a sequence of fields, so we can think of the first word in a line as the first field, the second word as the second field, and so on. An AWK program is a sequence of pattern-action statements. AWK reads the input a line at a time. A line is scanned for each pattern in the program, and for each pattern that matches, the associated action is executed.”

Command Line Examples

Let’s move towards few command line examples. As the above definition explains that an awk statement contains a pattern which must be matched to perform a specific action.

Pattern {action}

Here is one basic example for printing all fields of a given file.

$ awk ‘{print}’ file.txt

For printing specific field of a file $n variable is used where n is the number of field. Below example will print the first field of given file file.txt.

$ awk ‘{print $1}’ file.txt

For printing multiple fields, statement will be like this;

$ awk ‘{print $1 $2}’ file.txt

$0 will also print all fields of a given file.

$ awk ‘{print $0}’ file.txt

By default awk treats spaces and commas as a field separator but different field separator can be defined as well by using -F option. Below example will use : for field separator.

$ awk -F “:” ‘{print $2}’ file.txt

Here is how we can skip a field like 3rd field from the output.

$ awk ‘{$3=””; print $0}’ file

Let’s put a condition here for only processing first 10 records/lines of the file. NR is a builtin variable awk provides for finding the number of records that has been processed. Same as NF gives us the total number of fields/columns in a record.

$ awk ‘NR==1,NR==10{print $0}’ file.txt.

NR here will only allow fields 1 to 10 to be processed.

Finding Difference Between Two Column Elements

Let’s have an example where we want to find the difference between the elements of a same field. For example we have a file input.txt containing one field of numbers like below.

10

30

25

75

We want the difference between each element so that second element should be subtracted from first, third from second and so on. Output should look like this.

20

-5

50

First of all we know that first field is not being subtracted from anything so we should skip it and start from second field. For this example we have to write two awk statements. One possible solution is written below.

$ awk ‘NR>1{print $1 – OLD_VALUE} {OLD_VALUE = $1}’ input.txt

root@ubuntu:/scripts# echo "Input:" ; cat input.txt ; echo "Output:" ; awk 'NR>1{print $1 - OLD_VALUE} {OLD_VALUE = $1}' input.txt
Input:
10
30
25
75
Output:
20
-5
50

First statement will only be processed if the condition NR>1 is true whereas second statement does not have any condition. When awk will read first record of the field (NR ==1), NR>1 will not be true and first statement will not be processed. Second statement will be processed and OLD_VALUE will store the first record of first field which is 10. During second iteration (NR ==2) condition NR>1 will be true and first statement {print $1 – OLD_VALUE} will print the result in which $1 is now 30 and OLD_VALUE is 10. So the output will be 30 – 10 = 20. This will continue till the end of file.

Calculating Cumulative Sum

If we go for more complex examples then there is one more for calculating the cumulative sum of the field elements. Like if we have a field like written below

10

20

30

40

50

The output should be

10

30

60

100

150

Solution for this example is here

$ awk ‘NR>1{print $1 + x ; x=$1+x} NR==1{x=$1 ; print $1}’ sum.txt

root@ubuntu:/scripts# echo "Input:" ; cat sum.txt ; echo "Output:" ; awk 'NR>1{print $1 + x ; x=$1+x} NR==1{x=$1 ; print $1}' sum.txt
Input:
10
20
30
40
50
Output:
10
30
60
100
150

Again we have two statements depending on two different conditions to output the cumulative sum against each record.

For more details visit below links;

https://en.wikipedia.org/wiki/AWK

One thought on “AWK – For Beginners

Leave a Reply

Your email address will not be published. Required fields are marked *

*