1) Intro to basic text-based computer interfaces: bash and vim

Before we get into programming with python, we’ll cover interacting with computers using bash, a convient and fast route to file handing and much more.

Related text: Chapter 1 of Introduction to Scientific and Technical Computing

Optional related tutorials: http://swcarpentry.github.io/shell-novice/ https://danielmiessler.com/study/vim/#null

Shell overview

Many of the programs we use have GUIs: Graphical User Interfaces. Direct text input can often save quite a bit of time and allow additional functionality. While the tasks in this class will be able to be completed on a laptop or desktop (in any operating system), scientific computing generally requires larger, remote resources that are typically Unix or Unix-like systems. In those systems, the primary mode of user iteraction with the operating system is via text-based shell processes.

There are several different types of shells, including csh/tcsh, sh/Bash, ksh, and zsh. The most common type, and the only one we will discuss is bash, the Bourne-again shell. The associated reeading gives more detail on why bash is the way to go. FYI: if you are given an account on a Unix-based computer, the default shell assigned is usually bash, but this is an option you can change. Look to resources like https://stackoverflow.com/ will give you instructions to implement that.

Mac OSX is built on a Unix platform and has a built-in program, terminal, that provides shell. The default type is bash. The preferences will allow you to change to another type. For this class, we’ll stick with bash as it the one that will likely serve you best. FYI: bash on OSX is mostly the same as bash on a Linux machine, but there are a few differences (e.g., see https://unix.stackexchange.com/questions/82244/bash-in-linux-v-s-mac-os). This is good to remember if you see different behavior for the same command on these two types of machines.

Windows computers do not have a built-in Unix-like shell, but have shell emulators. There are many options, including https://www.putty.org/. Windows 10 now allows you to install Linux (see, for example, https://docs.microsoft.com/en-us/windows/wsl/install-win10) which is another route to get you to a bash shell.

For the first homework, you will need to write and run bash commands.

You will find bash scripts and commands useful throughout this course. They can even be invoked from within a python program.

Some key points from the Software carpentry module on bash:

  • A shell is a program whose primary purpose is to read commands and run other programs.
  • The shell’s main advantages are its high action-to-keystroke ratio, its support for automating repetitive tasks, and its capacity to access networked machines.
  • The shell’s main disadvantages are its primarily textual nature and how cryptic its commands and operation can be.

Some basic, extremely useful bash commands

First, open a bash session (e.g. launch terminal on your mac).

Enter the following commands (one at a time), and note what you see:

pwd
ls
ls -F
ls -a
ls -lhatr
man ls
which ls

Use cd to navigate to a different directory. Then cd to yet another directory. Now, explore what happens when you: - enter only cd - then enter cd - - enter cd .. - enter cd . - enter cd ../..

Other common, useful commands include:

cp
rm
mv
mkdir
cat
tail
head
wget
diff
wc
find

There are multiple ways to learn what they do, including reading the related text, using man’, and, of course, asking the internet. There are great tutorials for those interested in learning more (see link at the beginning of this notebook), which will pay off if you are a computational researcher. For this course, make sure you learn some basics, which we will go over today and practice on HW01.

Quick note on directories: to copy or delete a directory, you’ll need to add the flag -r, e.g. cp -r test_dir. However, be very, very careful with this. rm is forever.

* is wild. It can be very helpful! e.g. ls *.dat will display all the files in the current folder that have the .dat extention.

Some of my favorite bash time-saving tips

  • When you change your mind about what you want to type, ctrl-u clears the line. This is especially helpful when you are typing a password and may have made a typo; this lets you quickly start again
  • Want to run a command again? The up arrow will allow you to look back through your history. You can also use ctrl-r to reverse search through your history
  • ctrl-a moves the cursor to beginning of the line
  • ctrl-e moves the cursor to the end of the line

Customizing bash

In your home directory (e.g. type cd $HOME), you may have both a .bash_profile and .bash_rc. It can be confusing why there are two places you can customize bash, and where to put what, as described in multiple places such as: http://www.joshstaiger.org/archives/2005/07/bash_profile_vs.html

This and other source recommend editing .bashrc only, and making sure this is run by adding the following to your .bash_profile:

if [ -f ~/.bashrc ]; then
   source ~/.bashrc
fi

One customization I make sure to always add:

export CLICOLOR=1
export LS_COLORS='di=1;35'

The first line turns on color coding (e.g. use specified colored text for directories, etc.), and the second defines colors to use. Here “di” stands for directories, and the numbers after that mean bold;purple (you do not need to memorize these numbers!).

There are many options for defining colors, and many suggestions. This happens to the be choice I use. An internet search will provide many websites with details on the codes and alternate suggestions (for example, http://www.bigsoft.co.uk/blog/2008/04/11/configuring-ls_colors).

You can also set up aliases to save time. e.g.:

alias vmd="/Applications/VMD\ 1.9.3.app/Contents/MacOS/startup.command"
alias flux="ssh hbmayes@flux-login.arc-ts.umich.edu"

This lets me quickly launch vmd from my command line.

Pro tip: .bash_profile and .bashrc are only loaded when a new bash instance in launched. To take the changes take place immediately, check the file in with . .bash_profile (since this file will launch .bashrc, both do not need to be checked in). > FYI: It will only be checked in for that shell. If several instances were open before the change was made, the file needs to be checked in for each, or the instance closed and reopened.

Reusable bash scripts

Series of bash commands can be saved as a script and by executing the script. For example, make a file called bananas.sh with the following contents:

#!/usr/bin/env bash first=2 last=8 incr=2 name=banana echo 'May I have @@@num bananas?' > ${name}.tpl for job in $(seq $first $incr $last); do sed "s/@@@num/${job}/" < ${name}.tpl > ${name}_${job}.txt cat ${name}_${job}.txt done

The shebang and more

Before running this code, let’s look at the parts. #! is colloquially called the shebang. It is not strictly necessary in this case, but is good practice. If it were not included, the script would be run as a bash script if your shell is bash. However, including this in your script will ensure it is run as a bash script even if your shell is tsch, etc., with a shortcut (see below). It will also specify which bash instance will be used to run the script.

echo followed by the redirect symbol (>) will put the echoed text into a file. The strings first=, etc., define variable names. Variables are then used by typing the $ and then the variable name.

You do not need to enclose the variable name in curley braces ({}) unless there is a special character in the name (e.g. a space) but it is good practice as it can make it more readable and it never hurts to include them, but sometimes can hurt to exclude them.

How do you run this script? Try:

$ ./bananas.sh

The above is a shortcut to running any executable in the current folder.

Did it work? Probably not. Before you can run a script, the computer has to know that it is an executible. We do that with the command chmod. This is a powerful command that can change read (r), write (w), and excute (x) abilities for everyone, a group, and yourself. Before changing access, see what it is by typing ls -l bananas.sh. Now, make the file an excutable by entering:

$ chmod +x bananas.sh

Now see how the permissions have changed by typing ls -l bananas.sh, and then try excuting the file again with the command ./bananas.sh.

FYI: you can also run the script by entering the command bash bananas.sh. That’s just more typing, though!

Note: The double quotes (" ") are important when you want to use a variable inside the quote, because it will allow the variable to be interpreted. What happens if you change the double-quotes on the line starting with sed to single-quotes?

There are multiple very handy ways to loop through values. Above was an numerical loop, with the first and last numbers explicitly entered at the top of the file, as well as the increment value. Here are some other examples of loops. What changes should be made to the text below?

#!/usr/bin/env bash for student in Agnes Alex Alyssa; do echo 'May ${student} have some bananas?' done

It can be very handy to have lines of a text file be the list that is iterated. I have a file named students.txt, with line name per line, such as

Agnes
Alex
Alyssa
Carolina
Chloe
Christina
...
#!/usr/bin/env bash for student in $( < students.txt); do echo "May ${student} have some bananas?" done

With just the commands listed here on this brief intro to bash scripting, you can make templates and thus automate many repative tasks!

A note on file names

Most file names are something.extension. The extension isn’t required and doesn’t guarantee anything, but is normally used, and best practice to use it, to indicate the type file. txt is used for text files, and sh for bash scripts, .py for python scripts, etc.

If you only learn one thing in this class, please learn this: please don’t use spaces or special characters in your file names. Use only letters, numbers, _, - (although - should never be the first letter, bash will interpret it as a flag; if . is used first, it will be a ‘hidden’ file), and then a period followed by the appropriate extension

Please keep these files for some practice below.

Other great utilities and functions

To help us explore these utilities, make a file stp.txt with the contents:

Algorhyme

I think that I shall never see
A graph more lovely than a tree.

A tree whose crucial property
Is loop-free connectivity.

A tree which must be sure to span
So packets can reach every LAN.

First the root must be selected.
By ID it is elected.

Least cost paths from root are traced.
In the tree these paths are placed.

A mesh is made by folks like me
Then bridges find a spanning tree.

--From the paper "An Algorithm for Distributed Computation of a Spanning Tree in an Extended LAN" by Radia Perlman, inventor of the spanning tree protocol.

grep

This utility searches for strings in files. It has many functions. It can search in one or many files. By default, it returns the line with the matching word, but that can be adjusted (e.g. return lines before or after).

In the directory with the files stp.txt, enter the command: grep tree stp.txt. What do you see?

In the directory with the files banana_2.txt banana_4.txt banana_6.txt banana_8.txt, enter the command: grep banana *.txt. What do you see?

Directing input and output

< indicates that the file name after the character should be treated as input to the action preceeding the character. For grep, the input file is known based on its position, but we could also directly specifcy it. Try entering the following bash commands:

grep banana  banana_2.txt
grep banana < banana_2.txt
grep < banana_2.txt banana
grep banana_2.txt banana

This case is quite trivial. For working with grep and these files, > can be very useful. Try this; the first command should return a erorr that there is no such file, which is what we want (or it would be overwritten in following steps):

ls all_bananas.txt
grep banana *.txt > all_bananas.txt
ls all_bananas.txt
cat all_bananas.txt

In this case, the > will overwrite anything in the file all_bananas.txt if the file had existed. What if we just wanted to add it to the end? >> will accomplish that. >> will act the same as > if the file did not previously exist, so let’s use with the file we just made (all_bananas.txt). First, I check that the file is still there. If not, make it again as above.

cat all_bananas.txt
grep root stp.txt >> all_bananas.txt
cat all_bananas.txt

Combining commands

The | character can feed the results of the first command to the second command. Compare the commands below:

grep tree stp.txt > trees.txt
wc -l trees.txt
rm trees.txt

Versus:

grep tree stp.txt | wc -l

This is just one of many ways that bash commands can increase efficiencies.

sed

sed is a stream-line editor. It can function from the command line, within a bash script (as in the example above), or within vim.

Try the following commands:

cp stp.txt stp_edited.txt
diff stp.txt stp_edited.txt
sed 's/I/We/' stp_edited.txt
diff stp.txt stp_edited.txt

sed -i 's/I/We/' stp_edited.txt

Note: The above command words for Unix, but not on the Max OSX terminal. There, use sed -i ' ' 's/I/We/' stp_edited.txt. This command will not work on Unix. This is an example of some of the small difference between implementations on thse systems.

diff stp.txt stp_edited.txt

cp stp.txt stp_edited.txt
sed -i 's/I/We/g' stp_edited.txt

Note: again, on Mac OSx, follow -i with ' '

diff stp.txt stp_edited.txt

Sed it a very powerful utilitiy; this is just a quick taste of the different options that can be invoked, including search and replace in-place (in the input file), changing only the first instance on a line, etc.

If you are interested in learning more about sed, one resource (of many) is: https://www.gnu.org/software/sed/manual/sed.html

Sed can also be used while in vim. See http://vim.wikia.com/wiki/Search_and_replace for some nice tricks!

awk for text processing and extraction

We will not spend much time on this powerful utility. Hereis one example of its functionality. There are many resources to learn more, should you be interested (e.g. https://likegeeks.com/awk-command/; https://www.gnu.org/software/gawk/manual/gawk.html#toc-Getting-Started-with-awk for a whole book on it).

I often use awk to print a column from a file. Key variables available include: - $0 for the whole line. - $1 for the first field. - $2 for the second field. - $n for the nth field.

Try the following commands:

awk '{print $0}' all_bananas.txt
awk '{print $1}' all_bananas.txt
awk '{print $2}' all_bananas.txt
awk '{print $1,$3}' all_bananas.txt

This can be used in conjunction with other commands, (e.g. using |) to, for example:

tail -2n all_bananas.txt | awk '{print $1,$3}'

Summary

This is just a quick taste to help us use the bash shell for navitaging and editing files. We’ll practice using bash throughout the semester for our efficiency. It is not a main focus.

A very important tool will be jupyter notebooks, discussed next!