1) Intro to basic text-based computer interfaces: bash and vim¶
Before we get into programming with python, we’ll cover interacting with computers using bash, a convient and fast route to file handing and much more.
Related text: Chapter 1 of Introduction to Scientific and Technical Computing
Optional related tutorials: http://swcarpentry.github.io/shell-novice/ https://danielmiessler.com/study/vim/#null
Shell overview¶
Many of the programs we use have GUIs: Graphical User Interfaces. Direct text input can often save quite a bit of time and allow additional functionality. While the tasks in this class will be able to be completed on a laptop or desktop (in any operating system), scientific computing generally requires larger, remote resources that are typically Unix or Unix-like systems. In those systems, the primary mode of user iteraction with the operating system is via text-based shell processes.
There are several different types of shells, including csh/tcsh, sh/Bash, ksh, and zsh. The most common type, and the only one we will discuss is bash, the Bourne-again shell. The associated reeading gives more detail on why bash is the way to go. FYI: if you are given an account on a Unix-based computer, the default shell assigned is usually bash, but this is an option you can change. Look to resources like https://stackoverflow.com/ will give you instructions to implement that.
Mac OSX is built on a Unix platform and has a built-in program, terminal, that provides shell. The default type is bash. The preferences will allow you to change to another type. For this class, we’ll stick with bash as it the one that will likely serve you best. FYI: bash on OSX is mostly the same as bash on a Linux machine, but there are a few differences (e.g., see https://unix.stackexchange.com/questions/82244/bash-in-linux-v-s-mac-os). This is good to remember if you see different behavior for the same command on these two types of machines.
Windows computers do not have a built-in Unix-like shell, but have shell emulators. There are many options, including https://www.putty.org/. Windows 10 now allows you to install Linux (see, for example, https://docs.microsoft.com/en-us/windows/wsl/install-win10) which is another route to get you to a bash shell.
For the first homework, you will need to write and run bash commands.
You will find bash scripts and commands useful throughout this course. They can even be invoked from within a python program.
Some key points from the Software carpentry module on bash:¶
- A shell is a program whose primary purpose is to read commands and run other programs.
- The shell’s main advantages are its high action-to-keystroke ratio, its support for automating repetitive tasks, and its capacity to access networked machines.
- The shell’s main disadvantages are its primarily textual nature and how cryptic its commands and operation can be.
Some basic, extremely useful bash commands¶
First, open a bash session (e.g. launch terminal on your mac).
Enter the following commands (one at a time), and note what you see:
pwd
ls
ls -F
ls -a
ls -lhatr
man ls
which ls
Use cd
to navigate to a different directory. Then cd
to yet
another directory. Now, explore what happens when you: - enter only
cd
- then enter cd -
- enter cd ..
- enter cd .
- enter
cd ../..
Other common, useful commands include:
cp
rm
mv
mkdir
cat
tail
head
wget
diff
wc
find
There are multiple ways to learn what they do, including reading the
related text, using man
’, and, of course, asking the internet.
There are great tutorials for those interested in learning more (see
link at the beginning of this notebook), which will pay off if you are a
computational researcher. For this course, make sure you learn some
basics, which we will go over today and practice on HW01.
Quick note on directories: to copy or delete a directory, you’ll need to
add the flag -r
, e.g. cp -r test_dir
. However, be very, very
careful with this. rm
is forever.
*
is wild. It can be very helpful! e.g. ls *.dat
will display
all the files in the current folder that have the .dat
extention.
Some of my favorite bash time-saving tips¶
- When you change your mind about what you want to type,
ctrl-u
clears the line. This is especially helpful when you are typing a password and may have made a typo; this lets you quickly start again - Want to run a command again? The up arrow will allow you to look back
through your history. You can also use
ctrl-r
to reverse search through your history ctrl-a
moves the cursor to beginning of the linectrl-e
moves the cursor to the end of the line
Customizing bash¶
In your home directory (e.g. type cd $HOME
), you may have both a
.bash_profile
and .bash_rc
. It can be confusing why there are
two places you can customize bash, and where to put what, as described
in multiple places such as:
http://www.joshstaiger.org/archives/2005/07/bash_profile_vs.html
This and other source recommend editing .bashrc
only, and making
sure this is run by adding the following to your .bash_profile
:
if [ -f ~/.bashrc ]; then
source ~/.bashrc
fi
One customization I make sure to always add:
export CLICOLOR=1
export LS_COLORS='di=1;35'
The first line turns on color coding (e.g. use specified colored text for directories, etc.), and the second defines colors to use. Here “di” stands for directories, and the numbers after that mean bold;purple (you do not need to memorize these numbers!).
There are many options for defining colors, and many suggestions. This happens to the be choice I use. An internet search will provide many websites with details on the codes and alternate suggestions (for example, http://www.bigsoft.co.uk/blog/2008/04/11/configuring-ls_colors).
You can also set up aliases to save time. e.g.:
alias vmd="/Applications/VMD\ 1.9.3.app/Contents/MacOS/startup.command"
alias flux="ssh hbmayes@flux-login.arc-ts.umich.edu"
This lets me quickly launch vmd from my command line.
Pro tip: .bash_profile
and .bashrc
are only loaded when a new
bash instance in launched. To take the changes take place immediately,
check the file in with . .bash_profile
(since this file will launch
.bashrc, both do not need to be checked in). > FYI: It will only be
checked in for that shell. If several instances were open before the
change was made, the file needs to be checked in for each, or the
instance closed and reopened.
Recommended text editor: vim¶
Because of the breadth of functinoality, there is some start-up time to learn how to use it. There are many tutorials, such as https://danielmiessler.com/study/vim/#null and http://www.openvim.com/, that will be extremely helpful to the computational researcher. Some basic essentials are shared below.
FYI: some people prefer emacs. Vim is lightweight (doesn’t require as much computer resources) and more popular, so it is introduced here.
Customizing vim¶
In your home directory, you can make a .vimrc
file (if there is not
already one there) to make changes to your default configuration. My
favorite additions are:
set ruler
:syntax on
The ruler shows you what line and column you are on. The :syntax on
option provides helping coloring of key words.
You can also set up spell check and may other options in your
.vimrc
. See https://danielmiessler.com/study/vim/#null for several
suggestions, if you are interested.
‘Normal’ and ‘insert’ modes¶
Some basics: When you first open a file with vim, you will be in
‘normal’ mode. To enter text, you need to be in ‘insert’ mode. Type
i
to enter insert mode. When you are done typing (or pasting from
your buffer), press esc
to return to ‘normal’ mode.
There are other ways to enter insert mode. Typing ‘o’ will enter insert
mode on a new line below your curser, while O
will enter insert mode
on a new line above your curser.
In the normal mode, u
undoes the last action. .
repeats the last
action.
FYI: the tutorials will show you ways of entering words or characters with a specified number of repeats, replacing a letters or words (without having to go in and out of insert mode), and more time-saving tips. There are also other modes, such as the visual mode, which can be handy in selecting/inserting/deleting text, especially column-wise.
Copy and Paste¶
There is a different vim buffer from your computer’s cut buffer. If you copy text to your computer buffer (command-c), you can paste into vim by first entering the insert mode and then pasting from the computer cut butter (command-v).
If you want to cut and paste within your file in vim, vim offers
additional functionality. d
stands for delete and acts like cutting;
y
for yank and acts like copying; p
is for paste.
Enter dd
or yy
to delete or copy one line. If you want to
perform the operation on the current line plus the next 2 lines, type
d3
and then enter/return (no enter/return is needed with just
dd
). Whether deleted or yanked, those line(s) will now be in vim’s
buffer. p
will place those lines below your cursor. Before pasting,
you can move your cursor as desired. You can paste multiple types by
entering a number before p
, e.g. 10p
to paste the line(s) 10
times.
Saving and exiting¶
To save: while in normal mode, enter :w
. To quit, enter :q
. If
you try to quit without first saving, vim will warn you, tell you how to
exit without saving (:q!
). To save and exit together, type :wq
.
Caution: vim will put your file in editing mode, which is not always appropriate!¶
You may want to look at the output of a file while it is being
generated. Opening it in vim can interfere with proper additional
output. You can safely view such files by using alternate bash commands
or unix utilities, such as cat
, tail -f
, less
, or more
.
Reusable bash scripts¶
Series of bash commands can be saved as a script and by executing the
script. For example, make a file called bananas.sh
with the
following contents:
The shebang and more¶
Before running this code, let’s look at the parts. #!
is
colloquially called the shebang. It is not strictly necessary in this
case, but is good practice. If it were not included, the script would be
run as a bash script if your shell is bash. However, including this in
your script will ensure it is run as a bash script even if your shell is
tsch, etc., with a shortcut (see below). It will also specify which bash
instance will be used to run the script.
echo
followed by the redirect symbol (>
) will put the echoed
text into a file. The strings first=
, etc., define variable names.
Variables are then used by typing the $
and then the variable name.
You do not need to enclose the variable name in curley braces ({}
)
unless there is a special character in the name (e.g. a space) but it is
good practice as it can make it more readable and it never hurts to
include them, but sometimes can hurt to exclude them.
How do you run this script? Try:
$ ./bananas.shThe above is a shortcut to running any executable in the current folder.
Did it work? Probably not. Before you can run a script, the computer has
to know that it is an executible. We do that with the command chmod
.
This is a powerful command that can change read (r), write (w), and
excute (x) abilities for everyone, a group, and yourself. Before
changing access, see what it is by typing ls -l bananas.sh
. Now,
make the file an excutable by entering:
Now see how the permissions have changed by typing ls -l bananas.sh
,
and then try excuting the file again with the command ./bananas.sh
.
FYI: you can also run the script by entering the command
bash bananas.sh
. That’s just more typing, though!
Note: The double quotes (" "
) are important when you want to use
a variable inside the quote, because it will allow the variable to be
interpreted. What happens if you change the double-quotes on the line
starting with sed
to single-quotes?
There are multiple very handy ways to loop through values. Above was an numerical loop, with the first and last numbers explicitly entered at the top of the file, as well as the increment value. Here are some other examples of loops. What changes should be made to the text below?
#!/usr/bin/env bash for student in Agnes Alex Alyssa; do echo 'May ${student} have some bananas?' doneIt can be very handy to have lines of a text file be the list that is
iterated. I have a file named students.txt
, with line name per line,
such as
Agnes
Alex
Alyssa
Carolina
Chloe
Christina
...
With just the commands listed here on this brief intro to bash scripting, you can make templates and thus automate many repative tasks!
A note on file names¶
Most file names are something.extension
. The extension isn’t
required and doesn’t guarantee anything, but is normally used, and best
practice to use it, to indicate the type file. txt
is used for text
files, and sh
for bash scripts, .py
for python scripts, etc.
If you only learn one thing in this class, please learn this: please don’t use spaces or special characters in your file names. Use only letters, numbers, _, - (although - should never be the first letter, bash will interpret it as a flag; if . is used first, it will be a ‘hidden’ file), and then a period followed by the appropriate extension
Please keep these files for some practice below.
Other great utilities and functions¶
To help us explore these utilities, make a file stp.txt
with the
contents:
Algorhyme
I think that I shall never see
A graph more lovely than a tree.
A tree whose crucial property
Is loop-free connectivity.
A tree which must be sure to span
So packets can reach every LAN.
First the root must be selected.
By ID it is elected.
Least cost paths from root are traced.
In the tree these paths are placed.
A mesh is made by folks like me
Then bridges find a spanning tree.
--From the paper "An Algorithm for Distributed Computation of a Spanning Tree in an Extended LAN" by Radia Perlman, inventor of the spanning tree protocol.
grep
¶
This utility searches for strings in files. It has many functions. It can search in one or many files. By default, it returns the line with the matching word, but that can be adjusted (e.g. return lines before or after).
In the directory with the files stp.txt
, enter the command:
grep tree stp.txt
. What do you see?
In the directory with the files
banana_2.txt banana_4.txt banana_6.txt banana_8.txt
, enter the
command: grep banana *.txt
. What do you see?
Directing input and output¶
<
indicates that the file name after the character should be treated
as input to the action preceeding the character. For grep, the input
file is known based on its position, but we could also directly specifcy
it. Try entering the following bash commands:
grep banana banana_2.txt
grep banana < banana_2.txt
grep < banana_2.txt banana
grep banana_2.txt banana
This case is quite trivial. For working with grep and these files, >
can be very useful. Try this; the first command should return a erorr
that there is no such file, which is what we want (or it would be
overwritten in following steps):
ls all_bananas.txt
grep banana *.txt > all_bananas.txt
ls all_bananas.txt
cat all_bananas.txt
In this case, the >
will overwrite anything in the file
all_bananas.txt
if the file had existed. What if we just wanted to
add it to the end? >>
will accomplish that. >>
will act the same
as >
if the file did not previously exist, so let’s use with the
file we just made (all_bananas.txt
). First, I check that the file is
still there. If not, make it again as above.
cat all_bananas.txt
grep root stp.txt >> all_bananas.txt
cat all_bananas.txt
Combining commands¶
The |
character can feed the results of the first command to the
second command. Compare the commands below:
grep tree stp.txt > trees.txt
wc -l trees.txt
rm trees.txt
Versus:
grep tree stp.txt | wc -l
This is just one of many ways that bash commands can increase efficiencies.
sed¶
sed is a stream-line editor. It can function from the command line, within a bash script (as in the example above), or within vim.
Try the following commands:
cp stp.txt stp_edited.txt
diff stp.txt stp_edited.txt
sed 's/I/We/' stp_edited.txt
diff stp.txt stp_edited.txt
sed -i 's/I/We/' stp_edited.txt
Note: The above command words for Unix, but not on the Max OSX
terminal. There, use sed -i ' ' 's/I/We/' stp_edited.txt
. This
command will not work on Unix. This is an example of some of the small
difference between implementations on thse systems.
diff stp.txt stp_edited.txt
cp stp.txt stp_edited.txt
sed -i 's/I/We/g' stp_edited.txt
Note: again, on Mac OSx, follow -i
with ' '
diff stp.txt stp_edited.txt
Sed it a very powerful utilitiy; this is just a quick taste of the different options that can be invoked, including search and replace in-place (in the input file), changing only the first instance on a line, etc.
If you are interested in learning more about sed, one resource (of many) is: https://www.gnu.org/software/sed/manual/sed.html
Sed can also be used while in vim. See http://vim.wikia.com/wiki/Search_and_replace for some nice tricks!
awk for text processing and extraction¶
We will not spend much time on this powerful utility. Hereis one example of its functionality. There are many resources to learn more, should you be interested (e.g. https://likegeeks.com/awk-command/; https://www.gnu.org/software/gawk/manual/gawk.html#toc-Getting-Started-with-awk for a whole book on it).
I often use awk to print a column from a file. Key variables available
include: - $0
for the whole line. - $1
for the first field. -
$2
for the second field. - $n
for the nth field.
Try the following commands:
awk '{print $0}' all_bananas.txt
awk '{print $1}' all_bananas.txt
awk '{print $2}' all_bananas.txt
awk '{print $1,$3}' all_bananas.txt
This can be used in conjunction with other commands, (e.g. using |
)
to, for example:
tail -2n all_bananas.txt | awk '{print $1,$3}'
Summary¶
This is just a quick taste to help us use the bash shell for navitaging and editing files. We’ll practice using bash throughout the semester for our efficiency. It is not a main focus.
A very important tool will be jupyter notebooks, discussed next!