Repositório Institucional da UFPI

SOURCE CODE EXPERTISE: Improving Knowledge Models and Assessing Generative AI Impact

DSpace/Manakin Repository

Show simple item record

dc.contributor.author CASTRO, Otávio Cury da Costa
dc.date.accessioned 2025-09-16T15:51:43Z
dc.date.available 2025-09-16T15:51:43Z
dc.date.issued 2025-09-16
dc.identifier.uri http://hdl.handle.net/123456789/4059
dc.description Orientador: Guilherme Amaral Avelino Co-orientador: Prof. Dr. Pedro de Alcantara dos Santos Neto Examinador interno: Prof. Dr. Vinicius Ponte Machado Examinador interno: Prof. Dr. Romuere Rodrigues Veloso e Silva Examinador externo: Prof. Dr. Lincoln Souza Rocha Examinador externo: Prof. Dr. André Cavalcante Hora pt_BR
dc.description.abstract Abstract: Identifying developer expertise in source code is valuable in various Software Engineering contexts. Knowledgeable developers are best suited to perform tasks such as code review and onboarding. Numerous models have been proposed to estimate source code knowledge, making it a well-explored topic; however, important gaps remain that affect the accuracy and applicability of these models. Moreover, the increasing use of Generative Artificial Intelligence (GenAI) tools may influence how code expertise is acquired and measured. This study aims to develop more accurate models for identifying source code experts. We first investigate the correlation between development history variables and developers’ knowledge of source code files. We extract metrics from public and private repositories and survey developers about the files they contributed to. Based on these data, we propose a linear model and train machine learning classifiers, comparing their performance with existing models. We also apply the proposed models to the Truck Factor (TF) metric to assess their practical implications in identifying critical developers. To examine the impact of GenAI, we build a dataset combining code expertise metrics with information on ChatGPT-generated code integrated into open-source projects. We simulate different usage scenarios by assigning a portion of contributions to GenAI instead of developers and survey developers about their perception of GenAI’s effects on code comprehension. Our results show that First Authorship and Recency of Modification are the variables most strongly correlated with source code knowledge. The proposed machine learning models outperform linear baselines, achieving F-scores between 71% and 73%. When applied to the TF algorithm, they improved developer identification, reaching a best average F-score of 74%. GenAI usage negatively affected TF reliability, even in low proportions. Developers reported mixed perceptions, with concerns, especially about use by novice programmers. pt_BR
dc.language.iso other pt_BR
dc.subject Software Repository Mining pt_BR
dc.subject Code Expertise pt_BR
dc.subject Knowledge Concentration pt_BR
dc.subject Generative Artificial Intelligence pt_BR
dc.title SOURCE CODE EXPERTISE: Improving Knowledge Models and Assessing Generative AI Impact pt_BR
dc.type Preprint pt_BR


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account