The cosine similarity is a calculation used in data mining. As far as I’m aware, this is the first and only online cosine similarity calculator. The form is below. Sweet. Enjoy!

**Explanation**

This Cosine Similarity Calculator will teach you how to calculate the Cosine Similarity (a.k.a. how to calculate the Cosine Measure) of two vectors. Useful for both math homework and data mining.

The Cosine Similarity of two vectors is an arbitrary mathematical measure of how similar two vectors are on a scale of [0, 1]. 1 being that the vectors are either identical, or that their values differ by a constant factor.

The Cosine Similarity of two vectors (d1 and d2) is defined as:

cos( d1, d2 ) = dot(d1,d2) / ||d1|| ||d2||

Where dot(d1,d2) = d1[0]*d2[0] + d1[1]*d2[1] …

And Where ||d1|| = sqrt(d1[0]^2 + d1[1]^2 …)

(Additional Info For Data Miners: The Centroid Similarity Measure is simply the Cosine Measure of your clustering output. e.g. After clustering some data, if you only have two centroids, to get the Centroid Similarity Measure, you just take the Cosine Measure of the resultant vectors. If you have k centroids though such that k > 2 (and this formula works for k=2 as well), then it is the Summation From i=1 to K(Summation From j=1 to K (Cosine Similarity(Ci,Cj))).

**Directions**

This is a Cosine Similarity Calculator. There is currently little data validation so make sure your vectors are of **equal length**, are **numeric** in type, and with each value separated by a **single space**. For example ~> “1 2 3” (without the quote marks)

would be a valid input. After you press the “Calculate” button, the page will reload and your calculation will be below. Viola! Please leave comments or send me feedback with any changes you’d like to see.

**Calculator**

**Your calculations will appear hear after you push the Calculate button!**

## 9 Responses

## deepak

very useful!!!!!!

## Andreas

You forgot to handle division by zero, in case that the two vectors are orthogonal… 😉 But thank you anyway for this nice web app!

## Tyler

Good catch Andreas. I have updated the script. Give it a try now and let me know if it looks better.

## Vaso

Thanks a lot for posting this. was a helpful refresher while I am attempting to implement an algorithm for hierarchical taxonomies (collaborative creation of communal hierarchical taxonomies in social tagging systems, 2006)

## Alice

I originally tried to calculate the correlation for the following two little series:

0.585076 8.91039 5.219482 0.475492 20.29347

0.585076 8.32.07743 23.48767 3.42354 219.1695

Each number is, I believe, separated by one space. I got an error that there was a non-numeric character.

I then typed in

.58 8.9 5.2 .47 20.29

.58 32 23 3.4 219

Now it worked OK. Does your algorithm not like lots of decimal places?

Thanks, Alice

## Tyler

Hi Alice,

Thanks for the feedback. There is a typo in the original list of numbers at “8.32.07743”. I have updated the script so the error message explains which value was not numeric. If you remove the second decimal point from this number it will work. Thanks!

## Bert Carremans

Excellent explanation!! Thanks.

## gao

thank u, it helps a lot

## Arka Roy

Thanks for the calculator. Nice to see the output during the intermediate steps.