Python is a great programming language for data science, and in particular, financial analytics. It has applications in a wide variety of industries, but to many people, it remains mysterious. Let's demystify Python for finance -- with a quick introduction of some of the building blocks, and then a basic analysis of a large dataset.
Start with Numpy arrays, a special type of list that contains numerical data of a single type. These are a great data structure to use because they can make calculations much faster than a normal list. To use these arrays, we first must import Numpy, the package that contains this special type of array and other important functions and datatypes.
Next, create a simple list and convert it into a Numpy array.
import numpy as np list = [1,2,3,4,5] array = np.array(list)
We can use more Numpy functions such as standard deviation ('std') and 'mean' with the arrays to get a clearer understanding of the data.
array.mean() #outputs the average of the elements array.std() #outputs the standard deviation of the elements
Or we can just do basic arithmetic with the two arrays.
array = np.array([[1,2,3],[4,5,6]]) print(array * 2) # output: [[ 2 4 6] # [ 8 10 12]]
array2 = np.array([[1,4,6],[2,5,2]]) print(array/array2) # output: [[1. 0.5 0.5] # [2. 1. 3. ]]
*Notice that the output of the division operation contained only floating-point variables, as a Numpy array can only contain one type of data. If a single number in the array becomes a floating-point variable, all numbers in the array will become floating-point variables.
As you saw above, there is a Numpy array that accommodates a structuring of columns and rows like excel sheets, called 2D arrays. We can set one up like this:
two_dimensionsal_array = np.array([[1,2],[3,4],[5,6]])
The example above isn't difficult to visualize, but sometimes we are given very large sets of data. When this is the case, an easy way to visualize the columns and rows in the data is to use the methods 'shape' and 'size', which give the dimensions of the array and the number of columns times the number of rows, respectively.
print(two_dimensionsal_array.shape) # output: (3,2) print(two_dimensionsal_array.size) # output: 6
Data can also be generated within a range using the Numpy function 'arange', which returns a list of numbers ranging from the first up to the second parameter and increasing by the interval specified in its third parameter.
range_array = np.arange(1, 10, 1) print(range_array) # output: [1, 2, 3, 4, 5, 6, 7, 8, 9]
It's often useful to transpose a 2D array in order to group the elements differently. We can do that by using 'transpose.'
array = np.array([[1,2,4], ['Brown', 'Yellow', 'Grey'], ['SUV','Truck','Jeep']]) array_trans = np.transpose(array) print(array_trans) # output: [['1' 'Brown' 'SUV'] # ['2' 'Yellow' 'Truck'] # ['4' 'Grey' 'Jeep']]
This helps a lot for readability and setting up the data for display.
8. Now that we have a grasp on arrays, we can use these Numpy arrays to display our data in graphs. For this task, we'll use matplotlib- a Python package that can visualize big sets of data into histograms, line graphs, and scatter plots. The code below lays out a basic template for creating a matplotlib graph.
import matplotlib.pyplot as plt import numpy as np data1 = np.array([1, 2, 3, 4, 5]) # x-axis data data2 = np.array([5, 1, 4, 7, 8]) # y-axis data plt.plot(data1, data2) #plots parameters given x- and y-axis plt.show() #displays the plot on screen
You can also edit the graph's color, line style, and labels to give the graph your preferred look and accurate info.
plt.plot(data1, data2, color='red', linestyle='--') # plotting a dashed, red line plt.xlabel('x-axis') # titles the x-axis plt.ylabel('y-axis') # titles the y-axis plt.title('Sample graph') # titles the entire graph/plot
These help to make the graph stand out to the readers/viewers. We come to a problem though: sometimes the data isn't fit for a line graph. Thankfully, matplotlib comes with the functionality to implement scatter plots and histogram graphs. You can create a scatter plot using almost the same syntax that you used to create a line graph.
plt.scatter(data1, data2) plt.show()
The histogram function uses a slightly different syntax, as the input data is one dimensional. Optionally, the transparency of the histogram and the number of bars on the histogram can be specified with the 'alpha' and 'bins' parameters, respectively.
plt.hist(x=data1, alpha = 0.5, bins=3) plt.show()
A Real-World Example