Résumé
This thesis presents several works about relation extraction and classification in articles from Ouest-France, the largest newspaper in France. This use-case reveals several challenges around the available data, including a lack of annotated corpora and unbalanced data. The present works therefore discuss two possible ways to apply the performant state-of-the-art to this scenario, while questionning the relevance of state-of-the-art models here. A first approach is the detection of irrelevant entity pairs, to catch them before a classification model, so as to improve the quality of classification by improving the quality of samples to predict ,when the second solution is active learning, where we incrementally feed samples to the model, selecting at each iteration samples to maximize the prediction performance of the relation classification model. Those two approaches improved the performance of simple relation classification models, while the complexity of the state-of-the-art models proves not compatible with the type and amount of data currently available at Ouest-France. Additionally, we quickly explore several options for unsupervised relation extraction, which is not adaptable to our task, or self-supervised representation of relations, which shows enough encouraging results to be explored in the future.
Source: http://www.theses.fr/2022ISAR0018
.