Jesus Armenta-Segura

Data Scientist and AI researcher

h-index: 5

Ërdos Number: 3

Hello there! I'm a data scientist based in Monterrey, I have 3 year of experience working on NLP, Machine and Deep learning. During this time, I've worked on several projects involving deep neural networks. My research journey has also been directed towards AI areas such as computer vision, where I leveraged their techniques to address a financial problem related with the entertainment industry. Currently, I am working in a project of explainable AI (XAI) in which we aim to detect and explain violence descriptors in Spanish school report.

Experience

Detailed CV 🔗

   -   NLP engineer at IFE-ITESM

August 2024 - Dec 2024

We are working in a project of violence descriptors detection
and explanation in school reports, leveraging NLP techniques.

 

   -   NLP engineer at CIC-IPN

August 2022

  • Research papers writing (Publication list bellow)
  • Academic collaborations (highlights):
    • Anime Popularity Prediction before Huge Investments
    • Hope2024@IberLEF2024
    • Multimodal Hate Speech Detection 2023
    • The PolitiMX2023 dataset
    • Dravidianlangtech2023
  • Servers Maintenance (Ubuntu 20.04)

 

   -   Adjoint Professor at UNAM

January 2019 - July 2021

My Last AI projects

Explaining Violence Descriptors

In this work, we will propose AI models for explaining violence descriptors in school reports.
(Outcomes coming soon)

 
Popularity Prediction in Anime 🔗

In this work, we propose multimodal text-image AI methods to predict popularity of an anime.

 

Other AI projects:

Mistral-7b and Llama2 Embeddings for Anime Synopsis using Ollama 🔗
Custom BERT models for Hope Speech Detection 🔗
BERT for Hate Speech Detection in Memes 🔗
Anime Popularity Prediction with Traditional Classifiers (ENHANCED VERSION) 🔗

Stack & Skills

   
Programming/Markup Languages
Python
HTML
CSS
LaTeX
 
 
AI (ML/DL)
PyTorch
TensorFlow
Ollama
The
Transformers
Library
Huggingface
 
 
Web Scrapping
BeautifulSoup4
Selenium
 
 
Enviroments
Git
VScode
JuPyTer Notebook
 
 
OS
Linux
Windows
Ubuntu Server
Powershell

Academic Publications

List of publications
 
 
About the Anime Project:
 
Anime Popularity Prediction: A Multimodal Approach Using Deep Learning
Jesús Armenta-Segura, Grigori Sidorov.

SUBMITTED TO: PeerJ CS (21-june-2024)

Abstract:

In the japanese anime industry, predicting whether an upcoming product will be popular is crucial. This paper presents a dataset and methods on predicting anime popularity using a multimodal text-image dataset constructed exclusively from freely available internet sources. The dataset was built following rigorous standards based on real-life investment experiences. A deep neural network architecture leveraging GPT-2 and ResNet-50 to embed the data was employed to investigate the correlation between the multimodal text-image input and a popularity score, discovering relevant strengths and weaknesses in the dataset. To measure the accuracy of the model, mean squared error (MSE) was used, obtaining a best result of 0.011 when considering all inputs and the full version of the deep neural network, compared to the benchmark MSE 0.412 obtained with traditional TF-IDF and PILtotensor vectorizations. This is the first proposal to address such task with multimodal datasets, revealing the substantial benefit of incorporating image information, even when a relatively small model (ResNet-50) was used to embed them.

Anime success prediction based on synopsis using traditional classifiers
Jesús Armenta-Segura, Grigori Sidorov.

Published at: Research on Computer Science. Issue 152(9) (2023)

Abstract:

For predicting the success of an anime in its early stages of development, a baseline is proposed in this paper, based on the synopsis of its plot. AniSyn7 is presented, which is a corpus consisting of 6,928 anime synopsis with binary labels of successful/unsuccesful. The corpus was explored by vectorizing the synopsis using n-grams and dependence trees, so three traditional machine learning classifiers (Support Vector Machine, Gaussian Naive Bayes, and Logistic Regression) can be employed in order to study correlation between synopsis and success.

 
 
Other Journal Publications:
 
Two-agent Approximate Agreement from an Epistemic Logic Perspective
Jesús Armenta-Segura, Jeremy Ledent, Sergio Rajsbaum.

Published at: Computación y Sistemas Vol 26, No 2 (2022)

Abstract:

We investigate the two agents approximate agreement problem in a dynamic network in which topology may change unpredictably, and where consensus is not solvable. It is known that the number of rounds necessary and sufficient to guarantee that the two agents output values 1/k^3 away from each other is k. We distil ideas from previous papers to provide a self-contained, elementary introduction, that explains this result from the epistemic logic perspective.

 
 
Shared Tasks and Contests:
 
Ometeotl@Multimodal Hate Speech Event Detection 2023: Hate Speech and Text-Image Correlation Detection in Real Life Memes Using Pre-Trained BERT Models over Text
Jesús Armenta-Segura, César Núñez-Prado, Grigori Sidorov, Alexander Gelbukh, Rodrigo Román-Godínez.

Published at: Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text

Abstract:

Hate speech detection during times of war has become crucial in recent years, as evident with the recent Russo-Ukrainian war. In this paper, we present our submissions for both subtasks from the Multimodal Hate Speech Event Detec- tion contest at CASE 2023, RANLP 2023. We used pre-trained BERT models in both submis- sion, achieving a F1 score of 0.809 in subtask A, and F1 score of 0.567 in subtask B. In the first subtask, our result was not far from the first place, which led us to realize the lower impact of images in real-life memes about feel- ings, when compared with the impact of text. However, we observed a higher importance of images when targeting hateful feelings towards a specific entity.

Ometeotl at HOPE2024@IberLEF: Custom BERT Models for Hope Speech Detection (TO BE PUBLISHED)
Jesús Armenta-Segura, Grigori Sidorov.

Published at: COMING SOON Proceedings of IberLEF2024

Abstract:

Hope speech has the potential to mitigate hostile environments and alleviate illnesses and depression, making it crucial to detect automatically. In this paper, we present our submissions for the shared task HOPE at IberLEF 2024. This shared task encompasses two datasets: HopeEDI, consisting of tweets in Spanish, and PolyHope, which includes tweets in both English and Spanish. We proposed the use of custom BERT models, specifically pretrained on multilingual datasets and tailored for sentiment analysis. We achieved fourth, sixth and eighth place in the HopeEDI, PolyHope in Spanish, and PolyHope in English datasets, respectively, with respect of the Averaged F1 score.

LIDOMA at HOPE2023@IberLEF: Hope Speech Detection Using Lexical Features and Convolutional Neural Networks
Moein Shahiki-Tash, Jesús Armenta-Segura, Olga Kolesnikova, Grigori Sidorov, Alexander Gelbukh.

Published at: Proceedings of IberLEF2023

Abstract:

Hope speech can help to reduce hostile environments and alleviate illnesses and depression, which makes it important to detect it automatically. In this paper, we present our submission for the HOPE: Multilingual Hope Speech Detection shared task at IberLEF 2023, which includes two sub-tasks: detecting hope speech in Spanish tweets and English YouTube comments. We proposed a word-based tokenization approach to train a Convolutional Neural Network (CNN). Our decision to use CNNs was inspired by previous works in hope speech detection that achieved good results using this method. Our approach achieved the fourth place in both sub-tasks.

Lidoma@DravidianLangTech2023: Convolutional neural networks for studying correlation between lexical features and sentiment polarity in Tamil and Tulu languages
Moein Shahiki-Tash, Jesús Armenta-Segura, Zahra Ahani, Olga Kolesnikova, Grigori Sidorov, Alexander Gelbukh.

Published at: Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages

Abstract:

With the prevalence of code-mixing among speakers of Dravidian languages, DravidianLangTech proposed the shared task on Sentiment Analysis in Tamil and Tulu at RANLP 2023. This paper presents the submission of LIDOMA, which proposes a methodology that combines lexical features and Convolutional Neural Networks (CNNs) to address the challenge. A fine-tuned 6-layered CNN model is employed, achieving macro F1 scores of 0.542 and 0.199 for Tulu and Tamil, respectively

Lidoma at homo-mex2023@IberLEF2023: Hate speech detection towards the mexican spanish-speaking lgbt+ population. the importance of preprocessing before using bert-based models
Moein Shahiki-Tash, Jesús Armenta-Segura, Zahra Ahani, Olga Kolesnikova, Grigori Sidorov, Alexander Gelbukh.

Published at: Proceedings of IberLEF2023.

Abstract:

Hate speech targeting LGBT+ individuals poses a deeply ingrained problem with wide-ranging consequences, encompassing substance abuse disorders and discrimination. These specific concerns are particularly amplified in Mexico. In this paper, we present our submission on the first track of the HOMOMEX: Hate Speech Detection towards the Mexican Spanish-Speaking LGBT+ Population. We explore the dataset and we employ transformer architectures, who have demonstrated significant efficacy in similar sentiment analysis tasks. Specifically, we utilize BERT-based models and we show the importance of preprocessing by reaching the last place in the competition with a Macro F1 score of 0.73.

 
 
Thesis and Terminal Projects:
 
Consensus Solvability Through Combinatorial Topology (Graduate Thesis in Spanish)
Jesús Armenta-Segura

Abstract:

One of the most important goals of distributed computing is to imitate sequential computing with multiple computers working at the same time. To achieve it, the consensus tas arises as a fundamental task for getting several computers working as one. However cases have been found in which this task is unsolvable, which generates the side problem of determining solvability of consensus. This thesis presents a study with Combinatorial Topology on this problem.

Combinatorial Topology Models for Russian Card Problems (Undergraduate Thesis in Spanish)
Jesús Armenta-Segura

Abstract:

This thesis addresses the Russian Card problems, which are related to aspects of cryptography and information exchange from an atypical perspective. It seeks to provide a way to verify its possible solutions, for which topology models are built. combinatorics and some of the variables involved are generalized. These types of models are the result of 60 years of studying problems and puzzles of information exchange (as old as humanity itself), and thanks to recent task concepts and action models they allow the fulfillment of the desired objective with great rigor and formality.

Courses Taught

As Adjoint

Calculus I (in Spanish)
feb-jul 2021
Calculus I (in Spanish)
aug-dec 2020
Differential Equations I (in Spanish)
feb-jul 2020
Analytic Geometry I (in Spanish)
feb-jul 2020
Introduction to Abstract Algebra I (Álgebra Superior I)
aug-dec 2019
Calculus I (in Spanish)
aug-dec 2019
Differential Equations I (in Spanish)
feb-jul 2019