Blurred Visions

Perl

Blurred Visions

Perl One Minute Hacker - Mirror

25/12/2004 23:45 | Permalink

The One Minute Perl Hacker
Copyright ©, 1995-2000 Ashish Gulhati.
This article may be reproduced and distributed freely without modification.

Perl is an almost absurdly easy language to learn. In a few minutes, a complete novice can learn enough Perl to start writing useful code. Don't be fooled, though, because Perl is also an incredibly versatile language. The learning curve may be gentle, but it's a long way to the top. You'll find yourself discovering new tricks all the time, and returning repeatedly to Perl, as you do to a good book, in order to delve deeper into its intricacies and discover the expressive richness that lies below the surface. Here I'll quickly cover the basics of the language, to pave the way for more detailed discussions of its power.

Hello, World!

As usual, let's start with the inevitable "Hello, World!" program. If you save the following lines to a file, you can run it with the command perl , or (if you make it executable first) simply .

#!/usr/bin/perl
print "Hello, World!\n"; #Say Hello

Since Perl is an interpreted language, it must be run through the Perl interpreter. The first line of the example above tells a Unix system to feed the file to /usr/bin/perl. The second line is pretty self-explanatory perl code - it simply prints "Hello, World!" and a newline character to the standard output, usually the terminal. And that's about all it takes to greet the universe in Perl.

Notice that perl uses semicolons to separate statements, and that comments in perl are introduced with the # character, anywhere in a line.

Scalar Variables and Interpolation

Perl has three basic data types. Scalars, lists and hashes. Scalars are simple things like numbers and strings. Scalar variables always begin with a $ character:

$pi = 3.14159;
$myname = "Ashish Gulhati";

Perl, unlike most other languages, automatically converts strings to numbers and vice-versa, as required by the context. For example, the following code snippet:

$pi = "3.14159";
$bipi = $pi * 2;
print "$pi * 2 = ", $bipi, "\n";

will print "3.14159 * 2 = 6.28318". Other languages might flag a type error in the second line of code, hesitant to multiply a number with a string. Perl isn't that faint-hearted. In the third line, the number $bipi is automatically converted to a string for printing. The observant reader will also notice that the third line of code prints the value of the variable $pi instead of the string '$pi'. This is the magic of Perl's double quote interpolation. Double quotes cause variables in strings to be replaced with their values.

List Variables

Lists are arrays of scalars. Their variable names always begin with the @ character. They are indexed by number, starting from 0, and individual list elements can be accessed as scalars by providing an index in square brackets.

@weekdays = ('Tue', 'Wed', 'Thu', 'Fri');
print $weekdays[2];

will print the third element of the list @weekdays. Notice that a scalar member of a list is prefixed with a $ sign, not the @ sign, because $weekdays[2] is a scalar value ('Thu'), not a list value. Perl lists grow and shrink as required so there's never any need to specify the dimensions of a list beforehand. Adding elements to a list at the beginning can be done in either of two ways:

@weekdays = ('Mon', @weekdays);
@weekdays = unshift (@weekdays, 'Mon');

Both statements have the same effect. The list now looks like this:

('Mon', 'Tue', 'Wed', 'Thu', 'Fri');

We still don't have Saturday and Sunday as part of our week. So suppose we had another list:

@weekends = ('Sat', 'Sun');

We could now complete our week with either of the following statements:

@weekdays = (@weekdays, @weekends);
@weekdays = push (@weekdays, @weekends);

Notice that if lists occur as elements to be assigned to other lists, they get 'flattened', i.e. their members become members of the list being assigned to. A list itself can not be a member of another list. There are no true multi-dimensional arrays in Perl, and every element of a list must be a scalar (which could also be a reference to some other data object, but we'll get to that later). Lists are best thought of as a bunch of scalars in an ordered sequence. This points the way to a number of other valid, and often used, Perl constructs:

($a, $b) = ($c, $d); # $a = $c; $b = $d;
($a, $b) = @weekdays; # $a = 'Mon'; $b = 'Tue';
($bad_day, @week) = @weekdays;
(@week, $holiday) = @weekdays; # Not what you think!

The first two assignments are pretty easy to understand. What happens in the third one? $bad_day gets the value 'Mon', and @week gets all the rest of the values in @weekdays. But the last assignment doesn't do what you probably think it does. You don't get a holiday (or at least, $holiday is undefined) - because arrays are 'greedy'. @week will accept all the values from @weekdays, leaving nothing for $holiday. Avoid this last usage. At best it will make your program do something you didn't intend. At worst, you'll have to spend the rest of your life without Sundays.

Hash Variables

The third Perl data type is the associative array. As the name implies, these data objects associate pairs of scalars.

%addresses = ('Ashish' => 'hash@netropolis.org',
'Larry' => 'lwall@netlabs.com',
'Dean' => 'roehrich@cray.com');

Now we have an associative array %addresses, which associates names and e-mail addresses. The % sign is the prefix for associative arrays. If we want to retrieve an e-mail address from this associative array, we can do it by supplying the name of the person as a key into the associative array, in curly brackets:

$hashmail = $addresses {'Ashish'};

Again, notice that the prefix we use above is $, not %, because the value we are accessing is a scalar value ('hash@netropolis.org'), not an associative array. You can add to associative arrays by simple assignment, like this:

$addresses {'God'} = 'god@heaven.org';

Unlike normal arrays, associative arrays aren't flat lists of scalars. Internally, Perl implements them as hash tables, which are pretty random about the order in which they store things, and depend on a hash function to match keys to values. The advantage of this is that it enormously speeds up the matching process - because there's no need to sort or search on key names to find the corresponding values. With big associative arrays, such searches could take a very long time. The hash function does the work faster. That's why associative arrays are also called hash arrays, or simply hashes.

Input, Output and Interprocess Coummunication

Perl's built-in print function is the usual way to send output to a terminal, a file, or another process. You can specify an optional filehandle to the print function, to tell it where to send its output. What are filehandles and how do you use them? Essentially filehandles are objects that correspond to input or output streams. You create a filehandle by associating it with an I/O stream using the open function.

open (STUFF, ">file1");

creates a filehandle called STUFF and opens the file file1 for writing to, associating it's output stream with the filehandle STUFF. If the file didn't already exist, the open call would create it, and if it did exist, the open call would overwrite it. Now you have a filehandle for this output stream. Any output to this file will be handled through the STUFF filehandle. The > symbol is used in Unix shells for output redirection, so its use to indicate a file output stream in the open function is pretty apt. Similarly, a file input stream can be opened with a:

open (STUFF, "

(The < character is the input redirection character in Unix shells). The call above can also be written as:

open (STUFF, "file1");

Three standard filehandles are defined by default, because the Unix shell provides three standard I/O streams to all executing programs. STDIN represents the standard input stream, usually the keyboard of your machine. STDOUT represents the standard output stream, usually the display device of your terminal. STDERR represents the standard error stream, which is also usually the computer display, but may sometimes be redirected to a log-file or another program. You can also open I/O streams to other processes, as we'll see in a moment.

The print function's default filehandle is STDOUT. So if you omit the filehandle, it'll print to STDOUT. If you explicitly specify a filehandle, as in:

print STUFF "Hello, World!\n";

the output will go to the file (or process) associated with the filehandle, in this case, file1. Input is even simpler than output. The diamond operator (a pair of angle brackets enclosing a filehandle) is Perl's basic input method:

$line = ;

will read a single line from STDIN and store it in the variable $line. If you wanted to read a line from file1 instead, you'd simply replace STDIN above with STUFF. When you're done with a filehandle, remember to close its associated I/O stream:

close STUFF; # Done with "file1"

The One Minute Program

#!/usr/bin/perl
print "From: "; $from = ;
print "To: "; $to = ;
print "One-line Message: ";
$message = ;
open (MAIL, "| sendmail -t");
print MAIL "From: $from", "To: $to\n", $message;
close MAIL;

OK, so maybe you took a little longer than a minute to absorb the background information needed to understand this code, but who's got a stopwatch anyway? The Perl code above is a pretty useful little program. In fact, if you don't mind a little hyperbole once in a while, you could call it a rudimentary e-mail package for Unix systems. Let's have a closer look at it.

The first line just tells the system that this is a perl program, to be run through /usr/bin/perl. The next three lines prompt the user for input and store the responses in three scalar variables.

The fifth line creates an output stream to a process. Unix has long had the ability to run many programs at the same time, and allow them to communicate with each other. One method of communication between processes (running programs) is the pipe. When the output of one program is used as the input for another, you have a pipe.

Just like the > character is used in shells to redirect output to a file, so the | character is used to redirect output to a process. Perl's I/O capabilities allow you to create filehandles that represent pipes. As you'd expect, the syntax resembles that of Unix shells:

open (MAIL, "| sendmail -t");

You can think of this in the following way: "Pipe everything I send to the MAIL filehandle through the Unix command sendmail -t". This command (it may be different on some systems) invokesthe mail delivery subsystem (the sendmail program) to accept a mail message, so everything we print to MAIL now will be sent to sendmail. The next line of code simply prints the message headers and body to MAIL, in the format required by sendmail. After we've finished printing to the pipe, we close the output stream, and the mail is on its way. It did, indeed, take just a few minutes to get down to writing interesting code in Perl.

What You Want Is What You Get

One of the greatest things about Perl is that it's pretty smart. It knows how to do the right thing in the face of ambiguities and omissions on the part of the programmer. It derives much of this flexibility from its sensitivity to context.

Perl knows that some of its operators hold meaning only in a numeric context, and that others imply a string context. So it interprets your variables in a way that's meaningful in the relevant context. That's why strings get evaluated as numbers if you try arithmetic operations on them, and numbers look like text strings to the print operator. A more impressive example of Perl's context-sensitivity is visible in this modification of our One Minute Program:

#!/usr/bin/perl
print "From: "; $from = ;
print "To: "; $to = ;
print "Enter message, end with ^D\n\n";
@message = ;
open (MAIL, "| sendmail -t");
print MAIL "From: $from", "To: $to\n", @message;
close MAIL;

A minor modification is in the message string on line 4. But look closer. The interesting modification is that I've turned $message into @message throughout. So how does this change things? Very significantly, as it so happens. This little mail program can now send as many lines of text as you like, instead of just a single-line message.

The key is Perl's sensitivity to scalar and list context. When we read from a filehandle into a scalar variable, we get the next available line from the input stream. When we read into a list variable, we get every line as a separate list element. The right-hand side of the assignment is the same in both cases, but the left-hand side determines whether the assignment is occuring in a scalar context or a list context.

Another example of perl giving you What You Want: when an array is interpreted in a scalar context, it returns the number of array elements. When it's interpolated within double quotes, it returns a string of all the list elements separated by spaces:

$lines = @message;

sets $lines to the number of lines in the message. Whereas:

$lines = "@message";

sets $lines to a space-separated concatenation of all the lines in the @message array. And, as you can see from the code above, when we use an array with the print operator (or any operator that accepts a list of values), it's interpreted in list context, so it just gets flattened out into a number of scalars, just as if we'd provided individual scalar arguments instead of an array.

print MAIL @message;

is equivalent to:

print MAIL $message [0], $message [1], ...

While Perl's context-sensitivity is a major boon most of the time, it's also probably the single biggest source of side-effects and bugs if not properly understood. We'll come up against this over-smartness of Perl again and again, so keep it in mind and learn to use it instead of letting it confuse you.

Conditionals

To alter the flow of control in a program on the basis of the values of certain variables or expressions, one requires conditional constructs, like the traditional if construct:

if ($raining) {
open_umbrella;
}

The curly brackets enclose a block of instructions. The if operator has another form, which is often a more readable (and less strenuous) way to write, especially when the conditional block contains only a single instruction, as in:

open_umbrella if $raining;

To aid in constructing more complex conditional tests, Perl adopts C's logical AND and OR operators, && and ||. The && operator returns true if the expressions on both sides of it are true. Similarly the || operator returns true if either of its operands are true.

$rainbow = $raining && $sunny;
$bad_weather = $raining || $snowing;

Perl's conditionals are just as flexible as everything else. In general, since every valid expression in perl has a value associated with it, any valid expression (including assignments, subroutine calls, and regular expression matches) may be used as the conditional expression in a boolean test. If the value of the expression is anything other than 0, '', or '0', it evaluates to true in the context of the boolean test. So if you wanted to modify the One Minute Program to send mail only if the message isn't an empty string, you could write:

if ($message = ) {
open (MAIL, "| sendmail -t");
print MAIL ("From: $from", "To: $to\n", @message);
close MAIL;
}

Perl has no case or switch statement as in C or Pascal, but you could use something like this to accomplish the same thing:

red_thing if $color eq "red";
blue_thing if $color eq "blue";
green_thing if $color eq "green";

The eq operator is the string equality operator. Since strings are scalars in perl, unlike in C, they can be compared directly for equality. The eq operator provides a string context. If you want to do a numeric equality test, you must use the == operator, which provides a numeric context.

Loops and a Genie

The simplest loop in Perl just repeats a block of code while some condition holds true. Naturally, this is a while loop. If, in our simple mail program, we wished to keep reading mail till we encountered a line with a single '.' character on it, as the simplest Unix mail programs do, we could use a while loop, like this:

while (($line = ) ne ".\n") {
print MAIL ($line);
}

or, using the until loop, which is simply the reverse of the while loop:

until (($line = ) eq ".\n") {
print MAIL ($line);
}

This code is not quite optimal, though. For one, it doesn't correctly check for an end-of-file on the STDIN stream, so you can't finish your mail with a ^D character. And it still looks a lot like C, not magical like Perl. So we rewrite it like this:

while () {
last if $_ eq ".\n";
print MAIL;
}

Now that looks a lot more mystical, and needs some explaining. For starters, notice that the loop will exit as soon as the user types the end-of-file character, since the return value from the diamond operator will then be '', which evaluates to false in a boolean context.

Also notice that we've done away with the $line variable. This is your first introduction to a friend who's destined to become a faithful ally as you continue to program in Perl - the wonderfully psychic $_ built-in variable (perl has lots of built-in variables). This $_ genie is always where you want him to be, holding exactly what you want him to be holding. In a while loop with just a diamond operation as the condition, $_ is always holding the line last read from the filehandle. And $_ also happens to be the default operand for a lot of operators, including the print operator.

So this loop reads from STDIN while there's stuff to be read, and on every pass through the loop, $_ grabs the line just read in. We check to see if $_ holds just a dot and a newline, and break out of the loop (using the last statement) if that's the case. Otherwise, we print ($_ by default, naturally) to the MAIL filehandle and loop again for the next line of input. Perl also has a for loop, which is interesting enough to save up as a nice surprise for later.

Subroutines

Years ago some hacker must have looked at his code and realized that he had a lot of repeating code segments that performed the same job each time they were run. Being the lazy guy he was, he decided to abandon all his current projects and first write a tool that would help him abstract out all these common bits of code into a separate sub-program that he could call everytime he needed that specific job done. So he invented subroutines.

In a program that does some stuff over a network, you might often want to check if a host on the network is reachable. From the shell, you'd use the ping command to do this. In your program you could make a similar subroutine, and call it ping:

sub ping {
my ($host) = @_;
$_ = `/usr/etc/ping $host 1`;
return 1 if /is alive$/;
return 0;
}

This subroutine does some interesting stuff we haven't yet talked about, so let's take a close look at it. First off, notice the syntax for declaring a subroutine, Following the sub statement, you give the name of the subroutine, followed by a block of instructions. Control leaves a subroutine through a return statement, which can optionally pass a value back to the calling program. You use the subroutine just like you'd use any built-in perl function:

if (ping ("kal-el.netropolis.org")) {
# do network stuff with kal-el
}
else {
print STDERR "Sorry, kal-el is unreachable\n";
}

As you can guess, the value passed back by the return statement is the value of the ping ("kal-el.netropolis.org") expression: 1 if the host kal-el is alive, 0 if not.

Look at the first line in the subroutine block. The my operator is used to localize a variable. If you have a variable $host in your main program, you don't want its value to change everytime you use $host in your ping subroutine. So you use the my operator to make sure that as soon as control leaves the subroutine block, the old value of $host (if any) is restored. You can use the my operator in any block, not just a subroutine block, and the values of the variables listed will be localized to that block (a block, in case you forgot, is any segment of code contained within curly brackets).

Then we have the funny looking @_ creature. This is another perl built-in variable. It holds a list of all the arguments passed to a subroutine. If you remember your variable prefixes, you'll realize that @_ can also be thought of as $_[0], $_[1], and so on. There's that genie again. In this case, @_ holds a single value, the hostname we wish to ping. We assign this value to the local variable $host before proceeding. The line:

$_ = `/usr/etc/ping $host 1`;

is equivalent to:

open (FOOBAR, "/usr/etc/ping $host 1 |");
$_ = ;
close FOOBAR;

Remember the output pipe to sendmail in the One Minute Program? Here we have an input pipe from ping. It runs the Unix program /usr/etc/ping and hands its output to $_. This is a particularly nice piece of trickery to use for quick and dirty hacks. Any string enclosed within back-ticks (the ` character), is executed externally as a separate program and its return value is the output of the program.

Just like the double-quote, the back-tick is not so much a quote as an operator in its own right. And just like double-quotes, back-ticks interpolate, so $host gets replaced with kal-el.netropolis.org when the code is executed.

Finally, let's take a look at the second line in the subroutine block:

return 1 if /is alive$/;

This exits the subroutine and returns 1 if the output of the ping command ends in the substring "is alive". It uses the // operator, which checks its operand (by default, its operand is the magical $_ variable) to see if it matches a specific pattern, specified as a regular expression. For more on how regular expressions work, keep reading. Perl has many goodies waiting to be explored - this was just a one-minute overview of a language designed to give you many hours of programming pleasure.

Time for some poetry...

#!/usr/bin/perl
open PERL, book
and read PERL, $one, $minute;
bless \@Larry
and exit;