document-similarity-social-network

Document Similarity Social Network

Flow

Calculate tf-idf feature.

TF-IDF stands for “Term Frequency, Inverse Document Frequency.” It’s a way to score the importance of words (or “terms”) in a document based on how frequently they appear across multiple documents.

tf-idf

Calculate cosine similarity matrix.

similarity matrix

Calculate nodes relationships with threshold-based neighbors.

pick neighbors

What is Vis.js

A dynamic, browser based visualization library. The library is designed to be easy to use, to handle large amounts of dynamic data, and to enable manipulation of and interaction with the data. The library consists of the components DataSet, Timeline, Network, Graph2d and Graph3d.

Document Social Network Result

document network result

Practical application

document similarity application

Methodology references: