ANALYSIS OF GROUPS ON TEXTS: A STUDY OF THE ABSTRACTS OF THE BANCO DE TESES AND DISSERTATIONS OF CAPES

Um estudo dos resumos do banco de teses e dissertações da CAPES

  • Fernando Melo Faraco Universidade Federal de Santa Catarina
  • Alexandre Leopoldo Gonçalvez, Dr
  • João Arthur de Souza, Dr
  • José Leomar Todesco, Dr
  • Ronnie Carlos Tavares Nunes

Abstract

The process of knowledge discovery in large volumes of information has a wide field of application. The main tasks of classification, clustering and association have been used in different areas of knowledge to make it possible to identify useful knowledge in large volumes of data. In this article, the application of data mining techniques, especially the K-Means clustering algorithm, is analyzed with the objective of verifying its effectiveness for the analysis of data from the Brazilian Open Data Portal, a public data repository organized and made available for the population. The dataset used for the application of the clustering algorithm was extracted from the information provided on the thesis and dissertation database made available by CAPES (Coordination of Improvement of Higher Education Personnel). The data were processed and inserted in the Apache Solr® platform where they were indexed, and the clusters were generated from the Carrot2 software, using the K-Means algorithm with customized configurations. The clusters were generated year by year and consolidated, with different configurations of the algorithm, making it possible to compare the obtained terms. It was concluded that the results of the used tools are directly related to the choice of the number of initial clusters, but the potential for discovering non-obvious clusters is obvious.

Published
2018-09-19
How to Cite
FARACO, Fernando Melo et al. ANALYSIS OF GROUPS ON TEXTS: A STUDY OF THE ABSTRACTS OF THE BANCO DE TESES AND DISSERTATIONS OF CAPES. International Congress of Knowledge and Innovation - Ciki, [S.l.], v. 1, n. 1, sep. 2018. ISSN 2318-5376. Available at: <http://proceeding.ciki.ufsc.br/index.php/ciki/article/view/589>. Date accessed: 22 may 2019.