Large Scale Data Mining with Applications in Social Computing

Shagun Jhaver, (2014) “Large Scale Data Mining with Applications in Social Computing,” Masters Thesis, The University of Texas at Dallas.


The aim of this thesis is to analyze large scale data mining and its applications in the domain of social computing. This study sought to investigate the following case studies: Firstly, a description of a new framework for stream classification is presented. This frame- work predicts class labels for a set of instances in a data stream and uses various machine learning techniques to perform this classification. The framework is evaluated using both real-world and synthetic datasets, including a dataset used to perform a website fingerprint- ing attack by viewing it as a setwise classification problem. Secondly, an investigation of the parallelization of calculating edit distance for a large set of string pairs using the MapReduce framework is presented. This study demonstrates how large scale data mining opens new avenues of designing for dynamic programming algorithms. Thirdly, a comparative analysis of classifiers predicting politeness in a framework proposed by Danescu-Niculescu-Mizil et al is detailed. An application of this framework to study politeness in various web-logs is also presented. Finally, a discussion of different approaches to sentiment analysis of Twitter posts is pre- sented. An application of this processing to predict the rating of newly released movies is also developed.

BibTeX citation

  title={Large scale data mining with applications in social computing},
  author={Jhaver, Shagun},
  publisher={The University of Texas at Dallas}