Introducing the Shell

Objectives

  • Explain how the shell relates to the keyboard, the screen, the operating system, and users’ programs.
  • Explain when and why command-line interfaces should be used instead of graphical interfaces.

A Parable

Nelle Nemo, a marine biologist, has just returned from a six-month survey of the North Pacific Gyre, where she has been sampling gelatinous marine life in the Great Pacific Garbage Patch. She has 300 samples in all, and now needs to:

  1. Run each sample through an assay machine that will measure the relative abundance of 300 different proteins. The machine’s output for a single sample is a file with one line for each protein.
  2. Calculate statistics for each of the proteins separately using a program her supervisor wrote called goostat.
  3. Compare the statistics for each protein with corresponding statistics for each other protein using a program one of the other graduate students wrote called goodiff.
  4. Write up. Her supervisor would really like her to do this by the end of the month so that her paper can appear in an upcoming special issue of Aquatic Goo Letters.

It takes about half an hour for the assay machine to process each sample. The good news is, it only takes two minutes to set each one up. Since her lab has eight assay machines that she can use in parallel, this step will “only” take about two weeks.

The bad news is that if she has to run goostat and goodiff by hand, she’ll have to enter filenames and click “OK” 45,150 times (300 runs of goostat, plus 300×299/2 runs of goodiff). At 30 seconds each, that will take more than two weeks. Not only would she miss her paper deadline, the chances of her typing all of those commands right are practically zero.

The next few lessons will explore what she should do instead. More specifically, they explain how she can use a command shell to automate the repetitive steps in her processing pipeline so that her computer can work 24 hours a day while she writes her paper. As a bonus, once she has put a processing pipeline together, she will be able to use it again whenever she collects more data.

What and Why

At a high level, computers do four things:

  • run programs;
  • store data;
  • communicate with each other; and
  • interact with us.

They can do the last of these in many different ways; most of us use windows, icons, mice, and pointers. These technologies didn’t become widespread until the 1980s, and, critically, it is exceptionally difficult to automate mouseclicks.

There exists an automatable, text-only, interactive interface under the surface of our computers and phones. Called a command-line interface, or CLI, to distinguish it from the graphical user interface, or GUI, that most people now use. The heart of a CLI is a read-evaluate-print loop, or REPL: when the user types a command and then presses the enter (or return) key, the computer reads it, executes it, and prints its output. The user then types another command, and so on until the user logs off.

This description makes it sound as though the user sends commands directly to the computer, and the computer sends output directly to the user. In fact, there is usually a program in between called a command shell. What the user types goes into the shell; it figures out what commands to run and orders the computer to execute them.

A shell is a program like any other. What’s special about it is that its job is to run other programs rather than to do calculations itself. The most popular Unix shell is bash. Bash is the default shell on most modern implementations of Unix, and in most packages that provide Unix-like tools for Windows.

Using Bash or any other shell sometimes feels more like programming than like using a mouse. Commands are terse (often only a couple of characters long), their names are frequently cryptic, and their output is lines of text rather than something visual like a graph. On the other hand, the shell allows us to combine existing tools in powerful ways with only a few keystrokes and to set up pipelines to handle large volumes of data automatically. In addition, the command line is often the easiest way to interact with remote machines. As clusters and cloud computing become more popular for scientific data crunching, being able to drive them is becoming a necessary skill.

This no-frills text-only dialog with the computer can be your data file handling robot, downloading hundreds or thousands of data files and executing thousands or hundreds of thousands of programs on your behalf.

You just need some magic words.

Key Points

  • A shell is a program whose primary purpose is to read commands and run other programs.
  • The shell’s main advantages are its high action-to-keystroke ratio, its support for automating repetitive tasks, and that it can be used to access networked machines.
  • The shell’s main disadvantages are its primarily textual nature and how cryptic its commands and operation can be.

Table Of Contents

Previous topic

Automation Workshop Welcome

Next topic

Files and Directories

This Page

Edit this document!

This file can be edited directly through the Web. Anyone can update and fix errors in this document with few clicks -- no downloads needed.

  1. Go to Introducing the Shell on GitHub.
  2. Edit files using GitHub's text editor in your web browser (see the 'Edit' tab on the top right of the file)
  3. Fill in the Commit message text box at the bottom of the page describing why you made the changes. Press the Propose file change button next to it when done.
  4. Then click Send a pull request.
  5. Your changes are now queued for review under the project's Pull requests tab on GitHub!

For an introduction to the documentation format please see the reST primer.