Tutorial on basics of Python programming

Introduction

In this course, we will use Python for all the programming exercises (i.e. all code you return during the course should be written in Python). Python is a object-oriented, interpreted, high-level programming language. Along with the pure Python, we will also use NumPy array interface and NetworkX package for network analysis. These are readily installed in all Aalto student Linux workstations, but if you work e.g. on your own laptop you should install them yourself.

For writing Python, you can use the text editor of your choice (vim, emacs, etc.). Also programming environments specially planned for Python, such as Spyder, are available. Python scripts are stored in .py files. These scripts can be executed from terminal with the command python my_python_script.py.

There are plenty of online Python tutorials in the web - feel free to use any of them! You can try for example this one: https://www.codecademy.com/tracks/python. For documentation, check https://docs.python.org/2/.

IPython: Interactive Python interpreter

IPython is good for learning to know Python, playing around and testing short parts of your scripts; e.g. this tutorial will make use of IPython. You can start it in a terminal by typing ipython. However, longer scripts and functions should be saved in .py files.

Getting started with Python: Hello world

Let's write the famous Hello world example in Python:

In [1]:
print 'Hello world!'
Hello world!

Comments (!)

Documenting your code well is as important as your code itself. In Python you can add comments and documentation to your code on lines starting with a "#". Assistants will be gratefull for every comment in your returned solution code!

In [2]:
print "Hello world" # printing Hello world without the exclamation mark
Hello world

Basic data types: ints, floats, strings

In [3]:
x = 2 # int data type for integers

print x
print x + 3
print x - 1

print 'Inverse of x when x is int: ', 1 / x
print 'Inverse of x when x is float: ', 1 / float(x) # converting x to float to obtain the correct result

s1 = 'this is a '
s2 = "test string"
# strings can be combined by +
print s1 + s2 
2
5
1
Inverse of x when x is int:  0
Inverse of x when x is float:  0.5
this is a test string

Data structures: lists and dictionaries

In Python, a list is an ordered (and indexed) data structure. Items stored in a list can be of any Python data type. Let's create an empty list and add there a couple of items!

In [4]:
my_list = []
print 'This is my_list: ', my_list
my_list.append(1) # append adds element in the end of a list
my_list.append(2)
my_list.append(56)
my_list.append('test string')
print 'This is my_list after appending: ', my_list
This is my_list:  []
This is my_list after appending:  [1, 2, 56, 'test string']

List elements can be accessed by indices - indexing in Python starts from 0. With slicing one can access more than one element per time.

In [5]:
print my_list[0]
print my_list[-1] # index -1 returns the last element of a list
print my_list[0:2]
1
test string
[1, 2]

List items can be modified and removed. New items can be inserted to any location in the list.

In [6]:
my_list[0] = 7 # replacing the first element
print my_list
[7, 2, 56, 'test string']

In [7]:
second = my_list.pop(1) # pop returns the element at the given index and removes it from the list
print 'Popped list element: ', second
print 'Reminding my_list: ', my_list
Popped list element:  2
Reminding my_list:  [7, 56, 'test string']

In [8]:
print 'Length of my_list: ', len(my_list) # length of the list
new_element = 'new element'
my_list.insert(2, new_element)  # insert adds a new element at the given index
print my_list
print 'Length of my_list after insert: ', len(my_list) # length increases by insertion
Length of my_list:  3
[7, 56, 'new element', 'test string']
Length of my_list after insert:  4

A Python dictionary is a list of unordered key - value pairs. Keys can be of any immutable Python data type (e.g. number, string, NetworkX nodes, ...). Values can be any Python objects. Let's create an empty dictionary and add some key - value pairs!

In [9]:
my_dic = {}
my_dic['first'] = 1
my_dic['second'] = 2
my_dic[3] = 'three'
my_dic['fourth'] = [1, 2, 3]
print 'This is my_dic: ', my_dic
print 'Keys of my_dic: ', my_dic.keys()
print 'Values of my_dic: ', my_dic.values()
print 'Length of my_dic: ', len(my_dic) # returns the number of key-value pairs in the dictionary
This is my_dic:  {'second': 2, 3: 'three', 'fourth': [1, 2, 3], 'first': 1}
Keys of my_dic:  ['second', 3, 'fourth', 'first']
Values of my_dic:  [2, 'three', [1, 2, 3], 1]
Length of my_dic:  4

Values of the dictionary are accessed by the keys:

In [10]:
print my_dic['first']
print my_dic[3]
1
three

Note that since dictionaries are unordered, values cannot be accessed by indices!

The "values()" method gives a list of the values in the dictionary in "arbitrary" order. In practise this order is sometimes such that it might seem like the values are ordered using the keys. This happens in Python especially if the keys are small integers. NOTE THAT YOU CANNOT TRUST THIS TO HAPPEN every time, especially for larger dictionaries! Instead you should order the values yourself if this is the desired outcome. See the example below: (see the for loops section for explanation how the fourth row in the example works)

In [11]:
my_other_dict = dict([(5,"fifth"),(1,"first"),(3,"third"),(4,"fourth"),(2,"second")])
print my_other_dict
print my_other_dict.values() # The output of this might fool you to think that the values are always ordered
list_in_order = [my_other_dict[key] for key in sorted(my_other_dict)]
print list_in_order
{1: 'first', 2: 'second', 3: 'third', 4: 'fourth', 5: 'fifth'}
['first', 'second', 'third', 'fourth', 'fifth']
['first', 'second', 'third', 'fourth', 'fifth']

For loops

In Python, one can loop over most data structures that contain multiple elements (these objects are said to be iterable and include e.g. list, array, dict). On contrary to e.g. Matlab, the loop variable can get values of any type, not only numbers. As an example, let's loop over a list (note the intendation, 4 spaces, in the list block)!

In [12]:
my_list = range(0, 10)
print 'This is my_list: ', my_list
for item in my_list: # note the colon!
    print 'Item multiplied by 5: ', item * 5 # note the intendation of 4 spaces!
This is my_list:  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Item multiplied by 5:  0
Item multiplied by 5:  5
Item multiplied by 5:  10
Item multiplied by 5:  15
Item multiplied by 5:  20
Item multiplied by 5:  25
Item multiplied by 5:  30
Item multiplied by 5:  35
Item multiplied by 5:  40
Item multiplied by 5:  45

Looping works also for dictionaries (looping over keys). However, order in which the keys are looped is arbitrary.

In [13]:
for key in my_dic:
    print 'Key is', key
    print 'Associated value is', my_dic[key]
Key is second
Associated value is 2
Key is 3
Associated value is three
Key is fourth
Associated value is [1, 2, 3]
Key is first
Associated value is 1

With zip function, you can loop over two iterables of same size at once.

In [14]:
my_second_list = range(10, 20)
print 'This is my_second_list: ', my_second_list
for item_from_my_list, item_from_my_second_list in zip(my_list, my_second_list): # zip to loop over both lists
    print 'Item from my_list: ', item_from_my_list
    print 'Item from my_second_list: ', item_from_my_second_list
This is my_second_list:  [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Item from my_list:  0
Item from my_second_list:  10
Item from my_list:  1
Item from my_second_list:  11
Item from my_list:  2
Item from my_second_list:  12
Item from my_list:  3
Item from my_second_list:  13
Item from my_list:  4
Item from my_second_list:  14
Item from my_list:  5
Item from my_second_list:  15
Item from my_list:  6
Item from my_second_list:  16
Item from my_list:  7
Item from my_second_list:  17
Item from my_list:  8
Item from my_second_list:  18
Item from my_list:  9
Item from my_second_list:  19

Note that if you have two dictionaries with the same set of keys, these keys are not necessarily in the same order in both dictionaries. Thus, do not zip them to loop over both of them at once.

With enumerate function, you get easily both an iteration count and the looping variable from the iterable.

In [15]:
for index, item in enumerate(my_list):
    print 'Item at ', index, ': ', item
Item at  0 :  0
Item at  1 :  1
Item at  2 :  2
Item at  3 :  3
Item at  4 :  4
Item at  5 :  5
Item at  6 :  6
Item at  7 :  7
Item at  8 :  8
Item at  9 :  9

You will often want to construct lists using for loops, and in Python there is a convinient short way of doing this in one row of code. This is called "list comprehension" and it works as follows:

In [16]:
# Lets first construct a list as normal in a for loop
my_list_plusone = []
for item in my_list:
    my_list_plusone.append(item+1)
print my_list_plusone

# The same thing using list comprehension
print [item+1 for item in my_list]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

If-else statements

If statement allows to execute blocks of code only if a condition holds. In syntax, notice the usage of colon and insertation. Note, that the syntax of many comparison and boolean operators is in Python slightly different than in e.g. Matlab. List of the operators can be found e.g. here: https://docs.python.org/2/library/stdtypes.html#boolean-operations-and-or-not.

In [17]:
n_larger_than_5 = 0
for item in my_list:
    print 'Current item: ', item
    if item > 5:
        n_larger_than_5 += 1
        print 'Larger than 5!'
    else:
         print 'Item too small...'
print 'Number of items larger than 5:', n_larger_than_5
Current item:  0
Item too small...
Current item:  1
Item too small...
Current item:  2
Item too small...
Current item:  3
Item too small...
Current item:  4
Item too small...
Current item:  5
Item too small...
Current item:  6
Larger than 5!
Current item:  7
Larger than 5!
Current item:  8
Larger than 5!
Current item:  9
Larger than 5!
Number of items larger than 5: 4

Functions

In general, function is a block of code that can be called from other parts of the program with a given set of arguments. It can return values or e.g. write some results into a file. In Python, functions are defined using the keyword def. Note: if you have functions and executable script in same file, you need to define functions before calling them. In larger projects, it may make sense to define all your functions in a separated file and call them from the executable script.

In functions (or classes etc.) it is recommended to use triple-quotes (""") to document what the function does.

In [18]:
def my_function(x):
    """
    Returns x*x
    """
    result = x ** 2
    return result

print my_list
for item in my_list:
    print my_function(item)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
0
1
4
9
16
25
36
49
64
81

Importing python modules

In a case, where want to use a function defined outside of your script you are running, you can import other python modules with the import statement.

In [19]:
import math # now importing stuff from the python standard library

math.ceil(10.01)

##########################
#
# If you wanted to e.g. use a function called add, from
# a module wrote yourself (my_fancy_module.py)
# you could correspondingly do (in a file called e.g. work_script.py)
#
# import my_fancy_module
#
# a = 1
# b = 2
# my_fancy_module.add(a,b)
#
# Note that the file my_fancy_module.py should be located in your PYTHONPATH
# (i.e. the list of locations where the python interpreter looks for source codes)
#
# Easiest way to do this is to have your script and the module in the same place like this:
#
# path_to_my_code/my_fancy_module.py
# path_to_my_code/work_script.py
#
##########################
Out[19]:
11.0

With import, you can import modules, both those from standard Python libraries and self-written, also to ipython console. This can be useful when you want to e.g. test a given function with various inputs.

NumPy: an array interface for Python

NumPy package offers possibilities for efficient numerical computations, multidimensionality and array operations. Start using NumPy with the import statement:

In [20]:
import numpy as np # later on, use np to refer to NumPy

All elements of an array must have same data type (float, int, bool, ...). A NumPy array can be created either from a list or by different NumPy functions:

In [21]:
my_array = np.array([2, 5, 3, 15], dtype=float) # create an array from the given list
print 'This is my_array: ', my_array
my_zeros = np.zeros((2, 2,))
print 'This contains only zeros: '
print my_zeros
my_arange = np.arange(0, 10)
print 'This is a range array: '
print my_arange
my_ones = np.ones((3,3), dtype=bool) # this creates an array of ones, data type can be set to boolean
my_ones[1,2] = 0
print 'This is a boolean array: '
print my_ones
This is my_array:  [  2.   5.   3.  15.]
This contains only zeros: 
[[ 0.  0.]
 [ 0.  0.]]
This is a range array: 
[0 1 2 3 4 5 6 7 8 9]
This is a boolean array: 
[[ True  True  True]
 [ True  True False]
 [ True  True  True]]

Elements of NumPy arrays are accessed similarly as items in lists. However, notice that slicing do not create a copy of the array...

In [22]:
a = my_array[1]
print a
b = my_array[0:3] # b is not a new array but refers to the given elements in my_array!
print 'b: ', b
print 'my_array: ', my_array
b.sort()
print 'sorted b: ', b
print 'Also my_array is affected by sorting: ', my_array
5.0
b:  [ 2.  5.  3.]
my_array:  [  2.   5.   3.  15.]
sorted b:  [ 2.  3.  5.]
Also my_array is affected by sorting:  [  2.   3.   5.  15.]

For creating a real copy of a NumPy array, use method copy():

In [23]:
my_array = np.array([2, 5, 3, 15])
b = my_array[0:3].copy()
b.sort()
print 'sorted b: ', b
print 'This does not change my_array: ', my_array
sorted b:  [2 3 5]
This does not change my_array:  [ 2  5  3 15]

In the NumPy package, there is a number of functions for array manipulation, numerical operations, linear algebra, etc. Check e.g. http://docs.scipy.org/doc/numpy/ for documentation.

In [24]:
my_second_array = np.array([-3, 4, -5, 0])
my_third_array = my_array + my_second_array
my_fourth_array = 2 * my_array
average = np.average(my_array)
nonzero_indices = np.nonzero(my_second_array)
positive_indices = np.where(my_second_array > 0)
print 'array summation: ', my_third_array
print 'array multiplication: ', my_fourth_array
print 'array average: ', average
print 'indices of nonzero elements: ', nonzero_indices
print 'indices of positive elements: ', positive_indices
array summation:  [-1  9 -2 15]
array multiplication:  [ 4 10  6 30]
array average:  6.25
indices of nonzero elements:  (array([0, 1, 2]),)
indices of positive elements:  (array([1]),)

Visualization

In this course, we will use matplotlib.pyplot for plotting and visualization. Check the pyplot tutorial at http://matplotlib.org/users/pyplot_tutorial.html. The basic concept of pyplot is figure, the main plot container. A figure can contain multiple axes, i.e. plotting areas. As an example, let's plot some basic trigonemetric functions!

In [25]:
x = np.arange(0, 4 * np.pi, 0.1)
my_sin = np.sin(x)
my_cos = np.cos(x)

import matplotlib.pyplot as plt
# the command below is used only for producing this tutorial
# if copy-pasting code, ignore it
%matplotlib inline 

fig = plt.figure(1) # creating a figure object
ax = fig.add_subplot(111) # adding an axis object where the figure is actually drawn
ax.plot(x, my_sin, label='sin(x)') # making the plot
ax.plot(x, my_cos, label='cos(x)', color='r', ls='--')
ax.set_xlabel('x')
ax.set_ylabel('y = f(x)')
ax.legend(loc = 0)

fig2 = plt.figure(2)
ax2a = fig2.add_subplot(121) # subplot indexing: n rows, n columns, index - starts from 1!
ax2a.plot(x, x ** 2)
ax2a.set_xlabel('x')
ax2a.set_ylabel('f(x)')
ax2a.set_title('x^2')
ax2b = fig2.add_subplot(122)
ax2b.plot(x, np.sqrt(x))
ax2b.set_xlabel('x')
ax2b.set_ylabel('f(x)')
ax2b.set_title('sqrt(x)')
plt.tight_layout() # this command automatically fixes many subplot-related problems - try!


# SHOWING AND SAVING FIGURES!
#
# for showing your figures on your own desktop, use 
# plt.show()

# for automatic saving of a figure use something like
# plt.savefig("myfile.pdf")  # saves the figure to a file named myfile.pdf
# fig.savefig("myfile.png") # saves the specific figure object (fig) to file myfile.png

Examples on drawing histograms and distributions with pyplot will follow in the next tutorial. Check also the documentation: http://matplotlib.org/api/pyplot_api.html.

Network tools: NetworkX

NetworkX is a Python package of tools for creating, manipulating, and studying directed and undirected networks. In this course, we will use many tools from NetworkX. However, in many exercise problems you are also asked to write your own functions and compare them with the NetworkX tools. We warmly recommend the NetworkX documentation and tutorial at http://networkx.github.io/.

As an example, let's create a small undirected network and check some of its properties!

In [26]:
import networkx as nx

network = nx.Graph() # creating an empty network object
nodes = range(1, 9)
edges = [(1,4), (2,4), (3,4), (3,5), (5,4), (5,6), (6,4), (6,7), (7,8)]
network.add_nodes_from(nodes) # adding nodes to the network
network.add_edges_from(edges) # adding edges
print 'Nodes of the network: ', network.nodes()
print 'Edges of the network: ', network.edges()

fig = plt.figure()
nx.draw(network) # simple visualization, check the documentation for more parameters!

# Normally, you do not create the network from scratch but it is given in a file. One common format for storing network
# data is edgelist, .edg file. When the network is given as .edg, you can read it with the NetworkX functions like this:
# my_network = nx.read_edgelist(my_network_path) # returns an unweighted network
# or
# my_weighted_network = nx.read_weighted_edgelist(my_network_path) # return a weighted network
Nodes of the network:  [1, 2, 3, 4, 5, 6, 7, 8]
Edges of the network:  [(1, 4), (2, 4), (3, 4), (3, 5), (4, 5), (4, 6), (5, 6), (6, 7), (7, 8)]

NetworkX has ready functions for calculating most of the network properties, e.g.:

In [27]:
nx.adjacency_matrix(network)
# NOTE: in NetworkX 1.9 that is installed on Aalto Linux machines, nx.adjacency_matrix returns a sparse matrix. Then, 
# if you want to print the matrix, you can do
# a = nx.adjacency_matrix(network)
# a.todense()
Out[27]:
<8x8 sparse matrix of type '<type 'numpy.int64'>'
	with 18 stored elements in Compressed Sparse Row format>
In [28]:
nx.degree(network) # returns a node - degree dictionary
Out[28]:
{1: 1, 2: 1, 3: 2, 4: 5, 5: 3, 6: 3, 7: 2, 8: 1}
In [29]:
nx.neighbors(network, 3)
Out[29]:
[4, 5]
In [30]:
nx.diameter(network)
Out[30]:
4
In [31]:
nx.clustering(network) # local clustering coefficient
Out[31]:
{1: 0.0,
 2: 0.0,
 3: 1.0,
 4: 0.2,
 5: 0.6666666666666666,
 6: 0.3333333333333333,
 7: 0.0,
 8: 0.0}
In [32]:
nx.average_clustering(network, count_zeros=True) # average clustering coefficient
Out[32]:
0.275

Working with IPython

When writing Python code you can use your favorite text editor or programming framework (IDE). In Python you can also have interactive sessions (similar to Matlab) to try out things using the Python interpreter. It is convenient to use IPython for this purpose. In IPython shell you can, for example, get the documentation for any function by adding a ? symbol after any function name:

In [33]:
my_function?
nx.clustering?

For example, for my_function the documentation is the string that we specified in triple quotations when defining the function.

You can also run any external script file from the IPython shell with the %run command. For example, if you have a file called mycodefile.py that has the line print "Greetings from mycodefile.py", then you can do the following:

In [34]:
%run mycodefile.py
Greetings from mycodefile.py!

After running the script you can access all of the variables that were defined in the script.

You can also do debuggin using IPython. Just type the command %debug after any exception. For example:

In [35]:
my_function("foobar")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-35-83448ef7317a> in <module>()
----> 1 my_function("foobar")

<ipython-input-18-a77d333ae348> in my_function(x)
      3     Returns x*x
      4     """
----> 5     result = x ** 2
      6     return result
      7 

TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'
In [*]:
%debug
> <ipython-input-18-a77d333ae348>(5)my_function()
      4     """
----> 5     result = x ** 2
      6     return result


You can then use this mode to run any commands in the environment where the exception/error took place. In the example above you could "print x" to see what was in x and caused the error.