Python Workshop: NumPy

Based on:

This git of Zhiya Zuo

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

Powerful N-dimensional array object.
Useful linear algebra, Fourier transform, and random number capabilities.
And much more

numpy logo

NumPy installation

in the cmd run:

pip install numpy

Arrays

In [1]:

# After you install numpy, load it
import numpy as np  # you can use np instead of numpy to call the functions in numpy package

In [2]:

x = np.array([1, 2, 3])  # create a numpy array object
print(type(x))

<class 'numpy.ndarray'>

We can call shape function designed for numpy.ndarray class to check the dimension

In [3]:

x.shape  # can be compared to 'len()' function that is used with list size

Out[3]:

(3,)

Unlike list, we have to use one single data type for all elements in an array

In [4]:

y = np.array([1, "yes"])  # automatic type conversion from int to str
y

Out[4]:

array(['1', 'yes'], dtype='<U11')

Multidimensional arrays

In [5]:

arr = np.array([[1, 2, 3, 8]])
arr.shape

Out[5]:

(1, 4)

In [6]:

arr

Out[6]:

array([[1, 2, 3, 8]])

In [7]:

arr = np.array([[1, 2, 3, 8], [3, 2, 3, 2], [4, 5, 0, 8]])
arr.shape

Out[7]:

(3, 4)

In [8]:

arr

Out[8]:

array([[1, 2, 3, 8],
       [3, 2, 3, 2],
       [4, 5, 0, 8]])

Special arrays

There are many special array initialization methods to call:

In [9]:

np.zeros([3, 5], dtype=int)  # dtype can define the type of the array

Out[9]:

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

In [10]:

np.ones([3, 5])

Out[10]:

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [11]:

np.eye(3)

Out[11]:

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

Operations

The rules are very similar to R/Matlab: they are generally element wise

In [12]:

arr

Out[12]:

array([[1, 2, 3, 8],
       [3, 2, 3, 2],
       [4, 5, 0, 8]])

In [13]:

arr - 5

Out[13]:

array([[-4, -3, -2,  3],
       [-2, -3, -2, -3],
       [-1,  0, -5,  3]])

In [14]:

arr * 6  # element-vise multiplication

Out[14]:

array([[ 6, 12, 18, 48],
       [18, 12, 18, 12],
       [24, 30,  0, 48]])

In [15]:

arr * arr  # element-vise multiplication of two matrices

Out[15]:

array([[ 1,  4,  9, 64],
       [ 9,  4,  9,  4],
       [16, 25,  0, 64]])

In [16]:

np.exp(arr)

Out[16]:

array([[2.71828183e+00, 7.38905610e+00, 2.00855369e+01, 2.98095799e+03],
       [2.00855369e+01, 7.38905610e+00, 2.00855369e+01, 7.38905610e+00],
       [5.45981500e+01, 1.48413159e+02, 1.00000000e+00, 2.98095799e+03]])

More examples:

In [17]:

arr_2 = np.array([[1], [3], [2], [0]])
arr_2

Out[17]:

array([[1],
       [3],
       [2],
       [0]])

In [18]:

arr_2_T = arr_2.T  # transpose
arr_2_T

Out[18]:

array([[1, 3, 2, 0]])

In [19]:

arr @ arr_2  # matrix multiplication

Out[19]:

array([[13],
       [15],
       [19]])

In [20]:

arr

Out[20]:

array([[1, 2, 3, 8],
       [3, 2, 3, 2],
       [4, 5, 0, 8]])

In [21]:

arr.max()

Out[21]:

In [22]:

arr.cumsum()

Out[22]:

array([ 1,  3,  6, 14, 17, 19, 22, 24, 28, 33, 33, 41])

Note: element-by-element operations is done row-by-row, unlike in Matlab (column-by-column) There are many class methods to calculate some statistics of the array itself along some axis:

axis=1 means row-wise
axis=0 means column-wise

In [23]:

arr.cumsum(axis=1)

Out[23]:

array([[ 1,  3,  6, 14],
       [ 3,  5,  8, 10],
       [ 4,  9,  9, 17]])

Note about 1d arrays

1d array is not a column vector & not entirely a row vector and hence should be treated carefully when used with vector/matrix manipulation

In [24]:

a = np.array([1, 2, 3])
a, a.shape

Out[24]:

(array([1, 2, 3]), (3,))

In [25]:

c = np.array([[1, 2, 3]])
c, c.shape  # notice the shape diff

Out[25]:

(array([[1, 2, 3]]), (1, 3))

In [26]:

# can be multiply like a row vector
b = np.array([[1, 2], [3, 4], [5, 6]])
b

Out[26]:

array([[1, 2],
       [3, 4],
       [5, 6]])

In [27]:

a @ b

Out[27]:

array([22, 28])

In [28]:

# can't be transformed!
a.T, a.T.shape

Out[28]:

(array([1, 2, 3]), (3,))

A trick to transform 1d array into 2d row vector:

In [29]:

a_2d = a.reshape((1, -1))  # '-1' means to put all the rest of the elements in such a way that the reshape could fit
print(a_2d)
print(a_2d.T)

[[1 2 3]]
[[1]
 [2]
 [3]]

Indexing and slicing

The most important part is how to index and slice a np.array. It is actually very similar to list, except that we now may have more index elements because there are more than one dimension for most of the datasets in real life

1 dimensional case

In [30]:

a1 = np.array([1, 2, 8, 100])
a1

Out[30]:

array([  1,   2,   8, 100])

In [31]:

a1[0]

Out[31]:

In [32]:

a1[-2]

Out[32]:

In [33]:

a1[[0, 1, 3]]

Out[33]:

array([  1,   2, 100])

In [34]:

a1[1:4]

Out[34]:

array([  2,   8, 100])

We can also use boolean values to index

True means we want this element

In [35]:

a1 > 3

Out[35]:

array([False, False,  True,  True])

Masking

replacing values of array with another values according to a boolean mask

In [36]:

# this is the mask
a1[a1 > 3]

Out[36]:

array([  8, 100])

In [37]:

# this is a use of the above mask
a1[a1 > 3] = 100
a1

Out[37]:

array([  1,   2, 100, 100])

2 dimensional case

In [38]:

arr

Out[38]:

array([[1, 2, 3, 8],
       [3, 2, 3, 2],
       [4, 5, 0, 8]])

Using only one number to index will lead to a subset of the original multidimensional array: also an array

In [39]:

arr[0]

Out[39]:

array([1, 2, 3, 8])

In [40]:

type(arr[0])

Out[40]:

numpy.ndarray

Since we have 2 dimensions now, there are 2 indices we can use for indexing the 2 dimensions respectively

In [41]:

arr[0, 0]

Out[41]:

We can use : to indicate everything along that axis

In [42]:

arr[1]

Out[42]:

array([3, 2, 3, 2])

In [43]:

arr[1, :]

Out[43]:

array([3, 2, 3, 2])

In [44]:

arr[:, 1]  # watch out! we've got a 1d array again instead of column vector as maybe expected

Out[44]:

array([2, 2, 5])

In [45]:

# 2D masking
arr[arr > 3] = 55

3 dimensional case

As a final example, we look at a 3d array:

In [46]:

np.random.seed(1234)
arr_3 = np.random.randint(low=0, high=100, size=24)
arr_3

Out[46]:

array([47, 83, 38, 53, 76, 24, 15, 49, 23, 26, 30, 43, 30, 26, 58, 92, 69,
       80, 73, 47, 50, 76, 37, 34])

We can use reshape to manipulate the shape of an array

In [47]:

arr_3 = arr_3.reshape(3, 4, 2)
arr_3

Out[47]:

array([[[47, 83],
        [38, 53],
        [76, 24],
        [15, 49]],

       [[23, 26],
        [30, 43],
        [30, 26],
        [58, 92]],

       [[69, 80],
        [73, 47],
        [50, 76],
        [37, 34]]])

Note: Are the printed array not what you though it would be? Did they mixed the shape? No! see this for answers

In [48]:

arr_3[0]

Out[48]:

array([[47, 83],
       [38, 53],
       [76, 24],
       [15, 49]])

In [49]:

arr_3[:, 3, 1]

Out[49]:

array([49, 92, 34])

In [50]:

arr_3[2, 3, 1]

Out[50]: