SOURCE CODE EXPERTISE: Improving Knowledge Models and Assessing Generative AI Impact

DSpace Home
→
CAMPUS MINISTRO PETRÔNIO PORTELA
→
Centro de Ciências da Natureza - CCN
→
Doutorado em Ciência da Computação UFPI/UFMA
→
View Item

dc.contributor.author	CASTRO, Otávio Cury da Costa
dc.date.accessioned	2025-09-16T15:51:43Z
dc.date.available	2025-09-16T15:51:43Z
dc.date.issued	2025-09-16
dc.identifier.uri	http://hdl.handle.net/123456789/4059
dc.description	Orientador: Guilherme Amaral Avelino Co-orientador: Prof. Dr. Pedro de Alcantara dos Santos Neto Examinador interno: Prof. Dr. Vinicius Ponte Machado Examinador interno: Prof. Dr. Romuere Rodrigues Veloso e Silva Examinador externo: Prof. Dr. Lincoln Souza Rocha Examinador externo: Prof. Dr. André Cavalcante Hora	pt_BR
dc.description.abstract	Abstract: Identifying developer expertise in source code is valuable in various Software Engineering contexts. Knowledgeable developers are best suited to perform tasks such as code review and onboarding. Numerous models have been proposed to estimate source code knowledge, making it a well-explored topic; however, important gaps remain that affect the accuracy and applicability of these models. Moreover, the increasing use of Generative Artificial Intelligence (GenAI) tools may influence how code expertise is acquired and measured. This study aims to develop more accurate models for identifying source code experts. We first investigate the correlation between development history variables and developers’ knowledge of source code files. We extract metrics from public and private repositories and survey developers about the files they contributed to. Based on these data, we propose a linear model and train machine learning classifiers, comparing their performance with existing models. We also apply the proposed models to the Truck Factor (TF) metric to assess their practical implications in identifying critical developers. To examine the impact of GenAI, we build a dataset combining code expertise metrics with information on ChatGPT-generated code integrated into open-source projects. We simulate different usage scenarios by assigning a portion of contributions to GenAI instead of developers and survey developers about their perception of GenAI’s effects on code comprehension. Our results show that First Authorship and Recency of Modification are the variables most strongly correlated with source code knowledge. The proposed machine learning models outperform linear baselines, achieving F-scores between 71% and 73%. When applied to the TF algorithm, they improved developer identification, reaching a best average F-score of 74%. GenAI usage negatively affected TF reliability, even in low proportions. Developers reported mixed perceptions, with concerns, especially about use by novice programmers.	pt_BR
dc.language.iso	other	pt_BR
dc.subject	Software Repository Mining	pt_BR
dc.subject	Code Expertise	pt_BR
dc.subject	Knowledge Concentration	pt_BR
dc.subject	Generative Artificial Intelligence	pt_BR
dc.title	SOURCE CODE EXPERTISE: Improving Knowledge Models and Assessing Generative AI Impact	pt_BR
dc.type	Preprint	pt_BR