Meet Mr.Eigenvalue and Mr.Eigenvector from SVD!

Vigneshwar Ilango
8 min readJul 2, 2020

Basic understanding of Singular Value Decomposition for Data Scientists — Part: Linear Algebra

You might have always heard about SVD, Eigenvalues, and Eigenvectors in many contexts related to the fields of Dimensionality reduction, Data reduction, Matrix forms, Principal Component analysis, or recommendations and Data Scientist blogs.

This is a very brief explanation of SVD and what do this Eigenvalue and Eigenvector mean and how does it impact data reduction.

Singular value decomposition is a way to represent a big/ high dimensional matrix in a form that is smaller and easier for computation and representation. So if you ideally have an image, say a scenery picture which is 2056 X 2056 matrix/pixels and you want to compute some process on such images in larger quantities you might run out of ram and face memory issues or say, if you want to run a Neural Network model, it would take a long time to run on large size images. This helps in retaining the information from the images and reducing the size of the matrix such that the process is still valuable. This can be achieved with SVD.

The other use cases of SVD include providing a way to compute Ax = b, for non-square A regressions, it forms the basis for PCA — Principal Component Analysis, and they are highly used in Recommendation systems where we use a high dimension matrix of m-users as rows and n-movies and features as columns(in case of movies recommendation system). This article particularly focuses on its base property of ‘Data reduction’.

Let us consider a Matrix A with ‘m’ columns. Matrix A has a very large number of rows and columns. For example, let us say that A is a matrix consisting of different images of person-faces which each column representing different persons, and a single person’s face is stretched into a single column. Now if we apply SVD or separate the Matrix into another form to reduce the dimension, we would represent A as follows:

Singular Value Decomposition

The SVD is represented as

X = U. Σ. V

where,

X = The original Matrix

U = Eigenfaces (In the case of — People-face dataset)

Σ = Eignen Values

Vᵀ = Eigen mixture (In the case of — People-face dataset)

So, What do these matrixes represent?

So The Matrix A is represented as a product of three variables as mentioned above, where the Matrix U represents the ability to describe the variances in the faces present in the dataset. They have the same row size as A and they are hierarchically arranged by importance. To make it simple these values in U represent the values that give the difference between each person’s face. The values in the 1st row are somehow more important than the second row in defining the difference between the faces and the values in the second row define more about the variation among the faces than the third row and so on.

The Σ matrix is a diagonal matrix, where we have values only along the diagonal. Since its not a perfect square the values will be 0 after the diagonal matrix has reached the ‘m’th column. I have drawn a line(in the diagram) in the matrix Σ to represent the rows below it is 0. So this matrix represents the order by importance. These values, called Eigenvalues preserve the order of importance such that if the matrixes are multiplied it is set right and they are in by order.

The values Vᵀ represent the matrixes of mixtures in the case such that we can get the person's face when reverted back. These are ordered by hierarchical importance as well. If we consider another example of a farm field image taken from drone across different time, then U describe the variances among the different farm field in the columns, Σ represent the order of importance and the V matrix here essentially would hold the information of the Time. The Eigen matrices purely represent things based on the particular problem to solve.

These U and V matrixes are the Eigenvectors. You would probably see this equation as A(V) = Σ(V) in many other contexts, where Σ is the Eigenvalues, and V is the eigenvectors. So if we take the correlation matrix of the form we define, we can get to this form. Here in our case where X = U. Σ. Vᵀ, the correlation matrix is represented by

X.Xᵀ — This represents the correlation of the people’s faces with each other.

So X.Xᵀ = U . Σ. Vᵀ (x) V. Σ. Uᵀ

X.Xᵀ= VΣ²Vᵀ (Crossing out Uᵀ.U )

AV = VΣ² (Multiplying by V on both sides)

Here, V represents the Eigenvectors and Σ² represents the Eigenvalues. It is basically how we interpret the SVD.

What is ‘Economical’ SVD?

So this is implemented in python as simple as we can include the matrix in the svd library defined from NumPy and code:

=> u,s,v = svd(‘X’)

=> u,s,v = svd(‘X’,’econ’)

So what does this ‘econ’ mean?

As defined before, the values are usually arranged in a hierarchical order by importance. So we take a look at the matrixes, we would find that the maximum values or importance are always present only on the first m column or first n rows. As we go further and further are values are almost close to 0 or of non-importance. So if we remove all such rows of less importance, which are basically the noises present in the images, we would get a shorter matrix which we called the economical matrix. Note that now we have ~ symbol on the top of the matrixes which represent that it is economical. This does lose some property wherein if we take,

UᵀU = I (Identical). UUᵀ =I (Identical)

~Uᵀ~U = I (Identical). UUᵀ != I (Not identical)

Not going deep into the properties like unitary transformations, covariance, pseudo-inverse matrixes, linear system of equations, etc in this article. If any of you is interested would write the next series in regards to it. This article is subjected to know the meaning of Eigenvalues and Eigenvectors and its base nature of how it reduces the matrixes and still preserve the value.

So does that mean taking the ‘econ’ matrix is the end of data reduction? Definitely not. The econ matrix just excludes the non-important rows and columns. We can even represent an image with just first ‘r’ columns from the matrixes. The first ‘r’ column matrixes are sufficient enough to represent the images that we need.

The choice of r varies

Let me illustrate an example:

Let us take an image of a dog, Here, I have just loaded the image and converted the image into grayscale. Now you can view the significant features of the dog such as eyes, nose, ears are all clearly noticeable.

Let us construct an SVD of this image and view with different ranks ‘r’ and let's see how many values of ‘r’ are needed to represent the significant feature of this dog with reduced data columns.

Now I am calculating the Svd and let me just take the r values as 5.

This image doesn't much represent a dog, you can also view it as a man wearing a mask or anything which you can observe.

Now we are taking r = 20. We can see that we could identify this as a dog but it is quite blurry. Maybe consider you want to train a CNN model which identifies different types of dog. So now this is an okay image but let's see if we could get a good definition as we increase the r so it has better separation from the background.

Now we have taken r= 50. Yes. Perfect. I think this image is sufficient enough for my neural network to learn and classify the different types of dogs significantly. Now we could see that if we just took just 20% of the columns from the matrixes of SVD, it still represents the same image. This is because usually, the images have noises that are not visible to human eyes. This way of SVD surely helps remove the noises and represent the images in a better efficient way as needed.

The left image represents cumulative sum os singular values and the right image represents degrees of information.

So as you can see from the left plot that most values that represent the dog can be represented by just taking the first 100 r values out of the whole size available by using SVD. The right image gives the plot of how the degrees of information vary along taking the first ‘r’ values. It's clearly visible that the maximum degrees are captured in the first 100 and then the curve sinks.

Thus SVD with Eigenvalues and Eigenvectors provides a powerful way of reducing the data size. It can be applicable to many use cases. This forms the basis for PCA. Consider a recommendation system where after building the user-item matrixes, you find that the sparsity is 90% and it can't build a better-personalized recommendation. You can take the SVD of the matrix and train with that which will give better results than a nominal way.

I hope you are there closer to your goal than you were yesterday. I would appreciate some claps for better reach and please feel free to comment on the concepts that you would like to detail into. Thanks!

“Data matures like wine, applications like fish”. — James Governo

References: Data-driven Science and Engineering

Please feel free to connect on any further discussions:
LinkedIn : https://www.linkedin.com/in/vigneshwarilango/
Gmail: mr.vigneshwarilango@gmail.com

Regards,
Vigneshwar Ilango

--

--

Vigneshwar Ilango

I love data and can transform your data into profitable strategies. Open for opportunities. LinkedIn: https://www.linkedin.com/in/vigneshwarilango/