Cosine Similarity Calculator

posted in: Blog | 10

The cosine similarity is a calculation used in data mining.  As far as I’m aware, this is the first and only online cosine similarity calculator.  The form is below. Sweet.  Enjoy!  🙂

Explanation

This Cosine Similarity Calculator will teach you how to calculate the Cosine Similarity (a.k.a. how to calculate the Cosine Measure) of two vectors. Useful for both math homework and data mining.

The Cosine Similarity of two vectors is an arbitrary mathematical measure of how similar two vectors are on a scale of [0, 1]. 1 being that the vectors are either identical, or that their values differ by a constant factor.

The Cosine Similarity of two vectors (d1 and d2) is defined as:
cos( d1, d2 ) = dot(d1,d2) / ||d1|| ||d2||

Where dot(d1,d2) = d1[0]*d2[0] + d1[1]*d2[1] …
And Where ||d1|| = sqrt(d1[0]^2 + d1[1]^2 …)

(Additional Info For Data Miners: The Centroid Similarity Measure is simply the Cosine Measure of your clustering output.  e.g. After clustering some data, if you only have two centroids, to get the Centroid Similarity Measure, you just take the Cosine Measure of the resultant vectors. If you have k centroids though such that k > 2 (and this formula works for k=2 as well), then it is the Summation From i=1 to K(Summation From j=1 to K (Cosine Similarity(Ci,Cj))).

Directions

This is a Cosine Similarity Calculator. There is currently little data validation so make sure your vectors are of equal length, are numeric in type, and with each value separated by a single space. For example ~> “1 2 3” (without the quote marks)
would be a valid input. After you press the “Calculate” button, the page will reload and your calculation will be below. Viola! Please leave comments or send me feedback with any changes you’d like to see.

Calculator

Enter Vector 1 (Values Separated by Spaces)
Enter Vector 2 (Values Separated by Spaces)


Your calculations will appear hear after you push the Calculate button!



10 Responses

  1. very useful!!!!!!

  2. You forgot to handle division by zero, in case that the two vectors are orthogonal… 😉 But thank you anyway for this nice web app!

    • Good catch Andreas. I have updated the script. Give it a try now and let me know if it looks better.

  3. Thanks a lot for posting this. was a helpful refresher while I am attempting to implement an algorithm for hierarchical taxonomies (collaborative creation of communal hierarchical taxonomies in social tagging systems, 2006)

  4. I originally tried to calculate the correlation for the following two little series:
    0.585076 8.91039 5.219482 0.475492 20.29347
    0.585076 8.32.07743 23.48767 3.42354 219.1695
    Each number is, I believe, separated by one space. I got an error that there was a non-numeric character.
    I then typed in
    .58 8.9 5.2 .47 20.29
    .58 32 23 3.4 219
    Now it worked OK. Does your algorithm not like lots of decimal places?
    Thanks, Alice

    • Hi Alice,

      Thanks for the feedback. There is a typo in the original list of numbers at “8.32.07743”. I have updated the script so the error message explains which value was not numeric. If you remove the second decimal point from this number it will work. Thanks!

  5. Excellent explanation!! Thanks.

  6. thank u, it helps a lot

  7. Thanks for the calculator. Nice to see the output during the intermediate steps.

  8. Very useful. Thanks

Leave a Reply