Python Workshop: NumPy

Open In Colab


Based on:

This git of Zhiya Zuo


NumPy is the fundamental package for scientific computing with Python. It contains among other things:

  • Powerful N-dimensional array object.
  • Useful linear algebra, Fourier transform, and random number capabilities.
  • And much more

numpy logo

NumPy installation

in the cmd run:

pip install numpy

Arrays

In [1]:
# After you install numpy, load it
import numpy as np  # you can use np instead of numpy to call the functions in numpy package
In [2]:
x = np.array([1, 2, 3])  # create a numpy array object
print(type(x))
<class 'numpy.ndarray'>

We can call shape function designed for numpy.ndarray class to check the dimension

In [3]:
x.shape  # can be compared to 'len()' function that is used with list size
Out[3]:
(3,)

Unlike list, we have to use one single data type for all elements in an array

In [4]:
y = np.array([1, "yes"])  # automatic type conversion from int to str
y
Out[4]:
array(['1', 'yes'], dtype='<U21')

Multidimensional arrays

In [5]:
arr = np.array([[1, 2, 3, 8]])
arr.shape
Out[5]:
(1, 4)
In [6]:
arr
Out[6]:
array([[1, 2, 3, 8]])
In [7]:
arr = np.array([[1, 2, 3, 8], [3, 2, 3, 2], [4, 5, 0, 8]])
arr.shape
Out[7]:
(3, 4)
In [8]:
arr
Out[8]:
array([[1, 2, 3, 8],
       [3, 2, 3, 2],
       [4, 5, 0, 8]])

Special arrays

There are many special array initialization methods to call:

In [9]:
np.zeros([3, 5], dtype=int)  # dtype can define the type of the array
Out[9]:
array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])
In [10]:
np.ones([3, 5])
Out[10]:
array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])
In [11]:
np.eye(3)
Out[11]:
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

Operations

The rules are very similar to R/Matlab: they are generally element wise

In [12]:
arr
Out[12]:
array([[1, 2, 3, 8],
       [3, 2, 3, 2],
       [4, 5, 0, 8]])
In [13]:
arr - 5
Out[13]:
array([[-4, -3, -2,  3],
       [-2, -3, -2, -3],
       [-1,  0, -5,  3]])
In [14]:
arr * 6  # element-vise multiplication
Out[14]:
array([[ 6, 12, 18, 48],
       [18, 12, 18, 12],
       [24, 30,  0, 48]])
In [15]:
arr * arr  # element-vise multiplication of two matrices
Out[15]:
array([[ 1,  4,  9, 64],
       [ 9,  4,  9,  4],
       [16, 25,  0, 64]])
In [16]:
np.exp(arr)
Out[16]:
array([[2.71828183e+00, 7.38905610e+00, 2.00855369e+01, 2.98095799e+03],
       [2.00855369e+01, 7.38905610e+00, 2.00855369e+01, 7.38905610e+00],
       [5.45981500e+01, 1.48413159e+02, 1.00000000e+00, 2.98095799e+03]])

More examples:

In [17]:
arr_2 = np.array([[1], [3], [2], [0]])
arr_2
Out[17]:
array([[1],
       [3],
       [2],
       [0]])
In [18]:
arr_2_T = arr_2.T  # transpose
arr_2_T
Out[18]:
array([[1, 3, 2, 0]])
In [19]:
arr @ arr_2  # matrix multiplication
Out[19]:
array([[13],
       [15],
       [19]])
In [20]:
arr
Out[20]:
array([[1, 2, 3, 8],
       [3, 2, 3, 2],
       [4, 5, 0, 8]])
In [21]:
arr.max()
Out[21]:
8
In [22]:
arr.cumsum()
Out[22]:
array([ 1,  3,  6, 14, 17, 19, 22, 24, 28, 33, 33, 41])

Note: element-by-element operations is done row-by-row, unlike in Matlab (column-by-column) There are many class methods to calculate some statistics of the array itself along some axis:

  • axis=1 means row-wise
  • axis=0 means column-wise
In [23]:
arr.cumsum(axis=1)
Out[23]:
array([[ 1,  3,  6, 14],
       [ 3,  5,  8, 10],
       [ 4,  9,  9, 17]])

Note about 1d arrays

1d array is not a column vector & not entirely a row vector and hence should be treated carefully when used with vector/matrix manipulation

In [24]:
a = np.array([1, 2, 3])
a, a.shape
Out[24]:
(array([1, 2, 3]), (3,))
In [25]:
c = np.array([[1, 2, 3]])
c, c.shape  # notice the shape diff
Out[25]:
(array([[1, 2, 3]]), (1, 3))
In [26]:
# can be multiply like a row vector
b = np.array([[1, 2], [3, 4], [5, 6]])
b
Out[26]:
array([[1, 2],
       [3, 4],
       [5, 6]])
In [27]:
a @ b
Out[27]:
array([22, 28])
In [28]:
# can't be transformed!
a.T, a.T.shape
Out[28]:
(array([1, 2, 3]), (3,))

A trick to transform 1d array into 2d row vector:

In [29]:
a_2d = a.reshape((1, -1))  # '-1' means to put all the rest of the elements in such a way that the reshape could fit
print(a_2d)
print(a_2d.T)
[[1 2 3]]
[[1]
 [2]
 [3]]

Indexing and slicing

The most important part is how to index and slice a np.array. It is actually very similar to list, except that we now may have more index elements because there are more than one dimension for most of the datasets in real life

1 dimensional case

In [30]:
a1 = np.array([1, 2, 8, 100])
a1
Out[30]:
array([  1,   2,   8, 100])
In [31]:
a1[0]
Out[31]:
1
In [32]:
a1[-2]
Out[32]:
8
In [33]:
a1[[0, 1, 3]]
Out[33]:
array([  1,   2, 100])
In [34]:
a1[1:4]
Out[34]:
array([  2,   8, 100])

We can also use boolean values to index

  • True means we want this element
In [35]:
a1 > 3
Out[35]:
array([False, False,  True,  True])

Masking

replacing values of array with another values according to a boolean mask

In [36]:
# this is the mask
a1[a1 > 3]
Out[36]:
array([  8, 100])
In [37]:
# this is a use of the above mask
a1[a1 > 3] = 100
a1
Out[37]:
array([  1,   2, 100, 100])

2 dimensional case

In [38]:
arr
Out[38]:
array([[1, 2, 3, 8],
       [3, 2, 3, 2],
       [4, 5, 0, 8]])

Using only one number to index will lead to a subset of the original multidimensional array: also an array

In [39]:
arr[0]
Out[39]:
array([1, 2, 3, 8])
In [40]:
type(arr[0])
Out[40]:
numpy.ndarray

Since we have 2 dimensions now, there are 2 indices we can use for indexing the 2 dimensions respectively

In [41]:
arr[0, 0]
Out[41]:
1

We can use : to indicate everything along that axis

In [42]:
arr[1]
Out[42]:
array([3, 2, 3, 2])
In [43]:
arr[1, :]
Out[43]:
array([3, 2, 3, 2])
In [44]:
arr[:, 1]  # watch out! we've got a 1d array again instead of column vector as maybe expected
Out[44]:
array([2, 2, 5])
In [45]:
# 2D masking
arr[arr > 3] = 55

3 dimensional case

As a final example, we look at a 3d array:

In [46]:
np.random.seed(1234)
arr_3 = np.random.randint(low=0, high=100, size=24)
arr_3
Out[46]:
array([47, 83, 38, 53, 76, 24, 15, 49, 23, 26, 30, 43, 30, 26, 58, 92, 69,
       80, 73, 47, 50, 76, 37, 34])

We can use reshape to manipulate the shape of an array

In [47]:
arr_3 = arr_3.reshape(3, 4, 2)
arr_3
Out[47]:
array([[[47, 83],
        [38, 53],
        [76, 24],
        [15, 49]],

       [[23, 26],
        [30, 43],
        [30, 26],
        [58, 92]],

       [[69, 80],
        [73, 47],
        [50, 76],
        [37, 34]]])

Note: Are the printed array not what you though it would be? Did they mixed the shape? No! see this for answers

In [48]:
arr_3[0]
Out[48]:
array([[47, 83],
       [38, 53],
       [76, 24],
       [15, 49]])
In [49]:
arr_3[:, 3, 1]
Out[49]:
array([49, 92, 34])
In [50]:
arr_3[2, 3, 1]
Out[50]:
34