USO DE ALGORITMOS DE APRENDIZADO DE MÁQUINA SUPERVISIONADO PARA ROTULAÇÃO DE DADOS

USO DE ALGORITMOS DE APRENDIZADO DE MÁQUINA SUPERVISIONADO PARA ROTULAÇÃO DE DADOS

JAIME, Tarcisio Franco.

URI: http://hdl.handle.net/123456789/2208

Date: 2020-06-09

Abstract:

RESUMO:Com o avanço da tecnologia, cada vez mais equipamentos estão se conectando nas redes, gerando fluxos e processamento de dados. Com isso, mais algoritmos de aprendizado de máquina estão sendo estudados para extraírem informações relevantes desses grandes volumes. Com o grande aumento desse fluxo de dados, a interpretação destes pode ser prejudicada, sendo o grau de dificuldade proporcional a esse crescimento. É nesse contexto que essa pesquisa atua utilizando algoritmos de aprendizado de máquina supervisionados, os quais são algoritmos capazes de aprender através de determinados exemplos ou comportamentos. Neste trabalho realizou-se uma pesquisa científica com o objetivo de identificar em grupos de dados quais são os atributos mais significativos junto aos valores que mais se repetem a ponto de representá-lo, denominando-se essa técnica de rotulação. Dessa forma, utilizou-se técnica de algoritmos supervisionados, que através dos dados de entrada fazem uma correlação com uma saída desejável, e mediante isso, essa técnica é aplicada em todos os atributos para encontrar o mais significativo no cluster. Em seguida, a partir desse atributo mais significativo, utiliza-se um intervalo de dados que possui maior incidência de valores compondo o rótulo (atributo/faixa de valor). Nas bases testadas, somente uma dentre as quatro, obteviveram acurácias em alguns clusters abaixo de 70%, mas em todas outras os rótulos tiveram acurácias acima desse valor, indicando que é possível identificar os grupos através dos rótulos encontrados. ABSTRACT:With the advance of technology, more devices are connected in networks, generating data flows processing. With this, several machine learning algorithms are studied to extract relevant information from these large volumes. With the increase of this data flows, this interpretation can be impaired, being the degree of difficulty proportional to this growth. In this context, this research focus on works using supervised machine learning algorithms, which are algorithms capable of learning through certain examples or behaviors. In this, work a scientific research was carried out with the objective of identifying in groups of data which are the most significant attributes next to the values that are most repeated to the point of representing it, denominating this technique of labeling. Thus, we used the technique of supervised algorithms, which through the input data make a correlation with a desirable output, and therefore, this technique is applied to all attributes to find the most significant in the cluster. Then, from this most significant attribute, we use a data range that has the highest incidence of values composing the label (attribute/value range). In the tested databases, only one out of four had accuracy in some clusters below 70%, but in all other labels the labels were above that value, indicating that it is possible to identify the groups through the labels found.

Description:

Orientador: Prof.. Dr. Vinicius Ponte Machado. Examinador Interno: Prof. Dr. Erico Meneses Leão. Examinador Interno: Prof. Dr. Raimundo Santos Moura. Examinador Externo: Prof. Dr. Ivanovitch M. Dantas da Silva.

Show full item record