t-SNE ThreeJS

Square Grid

Trefoil

Link

Unlink

Subset Clusters

Three Clusters

Two Clusters

Sphere

Torus

Tetrahedron

Step
5,000

Points Per Side 8 Steps 200

Perplexity 30 Epsilon 5

Quick Description

Data visualization is an important tool to understand and interpret the data in many different areas. Having high dimensional datasets will make it difficult to visualize the data. In order to visualize high dimensional data, we need first to apply dimensionality reduction. Dimensionality reduction converts data from a high dimensional representation into a low dimensional one that can be visualized. One of the important available techniques is t-distributed stochastic neighbor embedding (t-SNE). t-SNE is first proposed by van der Maaten and Hinton in 2008 [1].

This technique is a user-interactive technique where the output will depend on the parameters given by the user. The first parameter is perplexity which represents the number of neighbors considered for each data point. Different perplexity values can give different visualizations with different structures.” Another user-defined parameter is the “epsilon” which represents the learning rate in the algorithm. Usually, epsilon is in the range between 10 and 1000 and the user should try different values to get into the good results.

On this page, we resume the work started at "How to Use t-SNE Effectively" [2] about how to interpret t-SNE plots, but in 3D. We estimate that 3D plots equipped with a camera may give better insights and quality of experience to the end-user. To this end, we use the threeJS library.

In addition to using these templates, you can upload your own data to "Interactive t-SNE".

1. Perplexity really matters

Since t-SNE results depend on the user-defined parameters, different perplexity values can give different results. As mentioned before, perplexity represents the number of nearest neighbors, so its value depends on the size of the dataset. It was recommended by van der Maaten & Hinton to choose perplexity value from the range of 5 to 50. The diagrams below show t-SNE plots for five different perplexity values.

Original

Perplexity: 2

Epsilon: 10

Step: 200

Perplexity: 5

Epsilon: 10

Step: 200

Perplexity: 30

Epsilon: 10

Step: 200

Perplexity: 50

Epsilon: 10

Step: 200

Perplexity: 80

Epsilon: 10

Step: 200

Each diagram shows a different result of 50 data points caused by a different perplexity value. Going out of the suggested range, i.e. outside (5,50), t-SNE results become less explainable. With perplexity 2, clusters aren't clearly separated. At perplexity 80, which is more than the number of data points, t-SNE isn't working properly and the clusters are about to merge.

We can see that at perplexity 50, the clusters are very clear and it gives a good sense of the global geometry, but does that mean that we should use only perplexity 50? The answer is NO, perplexity depends on the number of samples. In the above diagrams, we used 50 points so as testing we used 500 points in the below five diagrams to check if the best perplexity is still at 50.

Original

Perplexity: 2

Epsilon: 10

Step: 200

Perplexity: 5

Epsilon: 10

Step: 100

Perplexity: 30

Epsilon: 10

Step: 200

Perplexity: 50

Epsilon: 10

Step: 200

Perplexity: 80

Epsilon: 10

Step: 200

Now, none of the trial perplexity values gives a good sense of the global geometry. There may not be one perplexity value that will capture distances across all clusters but the user should try many values to get into the right one.

2. Number of iterations and good result

The t-SNE goal is to minimize the difference in similarity between points in high dimensional space and the desired low dimensional space, but when should we stop? should the number of iterations be infinite? We can't use a very big number of iterations, we should keep iterating until reaching stability in the result because at some step the points won't keep moving much.

Original

Perplexity: 30

Epsilon: 10

Step: 50

Perplexity: 30

Epsilon: 10

Step: 100

Perplexity: 30

Epsilon: 10

Step: 200

Perplexity: 30

Epsilon: 10

Step: 400

Perplexity: 30

Epsilon: 10

Step: 1,000

The images above show five different runs at perplexity 30. It is obvious that after 200 steps, the clusters become clearly separate and the convergence is reached. Unfortunately, the number of iterations required differs from a data set to another.

3. Learning rate (epsilon) really matter

The second parameter in t-SNE is the learning rate which is mentioned as “epsilon”. This parameter controls the movement of the points, so it is difficult to decide if a large or a small value is needed before running the algorithm with random values. The five diagrams below show how different epsilon values will lead to different results.

Original

Perplexity: 30

Epsilon: 10

Step: 1,000

Perplexity: 30

Epsilon: 50

Step: 1,000

Perplexity: 30

Epsilon: 100

Step: 1,000

Perplexity: 30

Epsilon: 500

Step: 1,000

Perplexity: 30

Epsilon: 1000

Step: 1,000

4. Early Exaggeration matters too

Another user-defined parameter for t-SNE is the early exaggeration parameter. This parameter is optional. It controls the distance between the points and the initial space between the clusters in the original dimension. Its task is to multiply the high dimensional data probabilities by a fixed value during the first stages of optimization. Large values will make the space between the clusters originally larger. The best value for early exaggeration can’t be defined, i.e. the user should try many values and if the cost function increases during initial optimization, the early exaggeration value should be reduced.

5. More plots may be needed for topology

Depending on the parameters mentioned before, perplexity and epsilon, the t-SNE plot can sometimes output topology shapes. One of the topological properties illustrated by "How to Use t-SNE Effectively" [2] is containment. The plots below show two groups of 200 points in 50-dimensional space. Both are sampled from symmetric Gaussian distributions centered at the origin, but one is 50 times more tightly dispersed than the other. The “small” distribution is in effect contained in the large one.

Original

Perplexity: 2

Epsilon: 10

Step: 200

Perplexity: 5

Epsilon: 10

Step: 100

Perplexity: 30

Epsilon: 10

Step: 200

Perplexity: 50

Epsilon: 10

Step: 200

Perplexity: 80

Epsilon: 10

Step: 200

Starting from perplexity 30, we start to see some of the topologies where one cluster is contained in another. At perplexity 50, the outer group becomes a sphere around the inner group where all its points are about the same distance from the inner group. Looking at the plot as 2D instead of 3D, it will be hard to see the topology and it will show that the clusters aren't really separate.

More types of topology can be done using t-SNE with different parameters' values. Some of these types are a link or a knot in three dimensions. As we look, before perplexity 30, none of the topologies was achieved. Looking into the link example at perplexity 30, as the 2D image, it shows that we have two circles linked together, but using the option of moving the plot around we can see that the two circles are unlinked.

Original

Perplexity: 2

Epsilon: 10

Step: 200

Perplexity: 5

Epsilon: 10

Step: 100

Perplexity: 30

Epsilon: 10

Step: 200

Perplexity: 50

Epsilon: 10

Step: 200

Perplexity: 80

Epsilon: 10

Step: 200

Original

Perplexity: 2

Epsilon: 10

Step: 200

Perplexity: 5

Epsilon: 10

Step: 200

Perplexity: 30

Epsilon: 10

Step: 200

Perplexity: 50

Epsilon: 10

Step: 1000

Perplexity: 80

Epsilon: 10

Step: 200

6. The desired low dimensions

Although t-SNE results are affected by the user-defined parameters discussed above, there is still a condition that affects the results which is the desired low dimension. In this section, we provide an option for the users to upload their own data and find the difference between the results in 3D and 2D. For some datasets we have used for testing, the clusters in 2D weren't clear depending on some parameters and number of samples while they were clear in 3D. One example can be that two clusters that are close to each other, can look as if they are the same cluster with different colors, but in 3D while moving the camera around, it can be shown that they are separated.

Having some results where 3D was better than 2D doesn't mean that 3D is always better. For some datasets, 3D was getting more error rates than 2D. The answer to which low dimension to choose is "it depends". Depending on the data and parameters chosen.

7. Going from high dimension to low dimension, will it preserve the shape?

t-SNE is known that it preserve the local structure of the data, but that doesn't mean that you will get the same info. To test this, we provide below an option which will display the data in high dimension, in our case it's 3D, and next to it the output of inverting it to low dimension, 2D, using t-SNE.

For some parameters, it was hard to guess what was the shape in the high dimension. The sphere wasn't presented by at least a circle which is close to the 2D projection of the sphere, also for the other shapes like torus and tetrahedron. User should try many parameters which can even reach perplexity value 1000 to get the closest shape in 2D.

Conclusion

t-SNE is now considered one of the top dimensionality-reduction algorithms. It is a very flexible and user interactive tool. But some of its limits are its computational complexity and the importance of trying many values of parameters to get good results. Also, the desired low dimension plays an important role in the result of t-SNE.

A 3D Playground for t-SNE

1. Perplexity really matters

Original

Original

2. Number of iterations and good result

Original

3. Learning rate (epsilon) really matter

Original

4. Early Exaggeration matters too

5. More plots may be needed for topology

Original

Original

Original

6. The desired low dimensions

7. Going from high dimension to low dimension, will it preserve the shape?

Conclusion

Acknowledgments

References