Practical usage of awk

Sponsored

Many of us know very little about awk – a wonderful C-like scripting language that is a part of almost any Linux distro since the early beginning.

In this post, I want to shed some light on awk and share a few practical examples of its daily use.

The most basic awk contruction is oftenly used for prining selected columns. Let’s review an output of a basic command ‘who’

$ who
johnny	 pts/0        2020-04-02 02:45 (98.33.30.36)
root     pts/1        2020-03-31 01:58 (tmux(25770).%0)
kkleim   pts/2        2020-03-31 01:59 (tmux(25770).%1)
howard   pts/3        2020-03-31 02:00 (tmux(25770).%3)
russ     pts/4        2020-03-31 02:01 (tmux(25770).%4)
dude1    pts/5        2020-03-31 02:02 (tmux(25770).%5

The most primitive way of awk usage might be:

$ who | awk '{print $1}'
johnny
root
kkleim
howard
russ
dude1

Generally speaking, awk is a self-sufficient C-like language that supports almost everything that an old language could. Here is an example of a cycle1Double “##” symbol is a concatination:

awk 'BEGIN { while(i<99){ str=str "##";i++} print str }'

This post does not pursue the goal of explaining every nuance of the language. Thus, let’s focus on the basics, e.g., useful one-liners that can be handy for daily use.

Before we move on with practical examples, I want to mention the most essential construction that will be re-used multiple times in this post as well as in the typical text-parsing tasks.

condition { actions }

This contruction can be expanded to something like:

awk '$1 ~ /pattern/ {print $1}'

In the example above, awk will be searching for patter matching the first column, where the standard delimiter is a space or tab. If the matching pattern found, awk will print it.

Field separators

By default, awk uses a space as a field separator. However, we can specify any other character. It can be anything like comma, letter of the alphabet, special symbol, etc.

$ echo 'a,b,c,d,e' | awk -F',' '{print $1 $2 $3 $4 $5}'
abcde

We can easily set semicolon as a FS (field separator) and parse /etc/password:

$ awk -F: '{print $1}' /etc/passwd | tail -10
_ctkd
_applepay
_hidd
_cmiodalassistants
_analyticsd
_fpsd
_timed
_nearbyd
_reportmemoryexception
_driverkit

How to print all columns except the first one?

Another typical example of daily awk usage is printing specific fields while not printing other data. Let’s say, we need to print out all columns, except the first one:

awk '{first = $1; $1 = ""; print $0, first; }'

As shown above, the first column “$1” sets as an empty string. “$0” means “print the entire string.” Altogether, it results in printing all columns except the first one. It might be quite handy!

Typical awk use-cases in one-liner scripts

awk ‘ {print $1,$3, $5} ‘Prints first, third and fifth columns
awk ‘ {print $0} ‘Prints the entire string including all columns
awk ‘ /’pattern’/ {print $2} ‘Prints the length of the longest string
awk ‘BEGIN { print “Hello, world” }’ “Hello world” on awk
awk ‘{ if (length($0) > max) max = \ length($0) } END { print max }’ inputfilePrints the length of the longest string
awk ‘length($0) > 99’ inputfileThis construction should print all strings with more than 99 symbols
awk ‘BEGIN { for (i = 1; i <= 10; i++) print int(101 * rand()) }’Generates random numbers in the range 0…100
awk -F: ‘{ print $1 }’ /etc/passwd | sortPrints a sorted list of usernames

Another awk capability is to perform math. For instance, awk can calculate square roots, logs, trigonometric functions (tangents, cotangents, etc.)

$ awk 'BEGIN {print sqrt(2020)}'
44.9444
$ awk 'BEGIN {print sin(2020)}'
0.044062

User-defined variables

Like any other programming language, awk allows the programmer to declare variables. Variable names can include letters, numbers, underscores. However, they cannot start with a number. You can declare a variable, assign a value to it and use it in your code like this:

$ awk '
BEGIN{
test="This is a test"
print test
}'

Conditional operator

Awk supports the standard if-then-else conditional format, similar to many other programming languages. The one-line version of the operator is the if keyword, followed by the expression being tested, in parentheses, and then the command to be executed if the expression is true.

For example, there is a file like this called testfile:

10
12
15
6
33
22
45

Let’s write a script that outputs numbers from this file greater than 20:

$ awk '{if ($1 > 20) print $1}' testfile

If you need to execute several “if” statements in the if block, they must be enclosed in curly braces:

$ awk '{
if ($1 > 20)
{
x = $1 * 2
print x
}
}' testfile

As already mentioned, an awk conditional statement can contain an else block:

$ awk '{
if ($1 > 20)
{
x = $1 * 2
print x
} else
{
x = $1 / 2
print x
}}' testfile

The else clause can be part of a one-line statement of a conditional statement, including only one line with the command. In this case, after the if branch, immediately before the else, you need to put a semicolon:

$ awk '{if ($1 > 20) print $1 * 2; else print $1 / 2}' testfile

The “for” loop

“For” loops are used in many programming languages. They are also supported by awk. Let’s solve the problem of calculating the average value of numeric fields using the following cycle:

$ awk '{
total = 0
for (i = 1; i < 4; i++)
{
total += $i
}
avg = total / 3
print "Average:",avg
}' testfile

The counter variable’s initial value and the rule for changing it in each iteration, and the condition for terminating the loop, are specified at the beginning of the loop, in parentheses. As a result, we don’t need to increment the counter ourselves, unlike the “while” loop.

Formatted data output

The awk “printf” command allows you to output formatted data. It allows you to customize the appearance of the output by using templates that can contain text data and formatting specifiers.

A format specifier is a special character that specifies the type of output data and how it should be output. Awk uses format specifiers as pointers to where to insert data from variables passed to “printf”.

The first specifier matches the first variable, the second specifier matches the second, and so on.

Formatting specifiers are written as follows:

%[modifier]control-letter

Here are some of them:

c – takes the number passed to it as an ASCII character code and outputs this character.
d – displays a decimal integer.
i – the same as d.
e – displays a number in exponential form.
f – displays a floating-point number.
g – prints a number in either exponential notation or floating-point format, whichever is shorter.
o – displays the octal representation of a number.
s – displays a text string.

Here’s how to format the output using “printf”:

$ awk 'BEGIN{
x = 100 * 100
printf "The result is: %e\n", x
}'

Here, as an example, we display a number in exponential notation. We believe that this is enough for you to understand the main idea behind working with “printf”.

This note can be a long one, but I didn’t want to make another awk tutorial. There are plenty of them over the network. This short post’s main thought is that awk is still powerful and handy more than forty years after its first release in 1977.

Be the first to comment

Leave a Reply

Your email address will not be published.


*