IntegrityAI at GenAI Detection Task 2: Detecting Machine-Generated Academic Essays in English and Arabic Using ELECTRA and Stylometry

Brad Magnetta

January 13, 2025

If you want to read more in depth about this subject, you can refer to the full article available at the following URL. It provides additional insights and practical examples to help you better understand and apply the concepts discussed.

TLDR

In this blog, we delve into the fascinating world of machine learning, focusing on the detection of machine-generated academic essays. We explore the groundbreaking research by Mohammad AL-Smadi from Qatar University, who utilized pre-trained transformer-based models to detect English and Arabic essays. The models, ELECTRA for English and AraELECTRA for Arabic, achieved impressive results with an F1-score of 99.7% and 98.4% respectively. We'll break down the technical aspects of these models, discuss their significance, and provide practical guidance on how to apply these technologies in your own projects.

Introduction to the Research and its Innovations

The rise of machine learning has brought about numerous advancements, one of which is the ability to generate academic essays. However, the challenge lies in detecting these machine-generated essays. Mohammad AL-Smadi's research tackles this problem head-on, focusing on essays in English and Arabic. The study utilized pre-trained transformer-based models, specifically ELECTRA for English and AraELECTRA for Arabic. These models underwent a preprocessing phase, where stylometric features were extracted and used to enhance their performance. Additional layers such as Dropout Layer, Batch Normalization, Fully Connected Layers, Rectified Linear Unit activation Function, and an Output Layer were included to further improve the models' capabilities.

Pseudocode: Extract stylometric features and prepare data for the model.

# Function to extract stylometric features from text
def extract_stylometric_features(text):
    features = {
        'avg_word_length': average_word_length(text),
        'sentence_length_variance': variance_in_sentence_length(text),
        'word_frequency': compute_word_frequency(text)
    }
    return features

# Example usage
essay_text = load_essay('essay.txt')
features = extract_stylometric_features(essay_text)
print("Stylometric Features:", features)

‍

Key Developments and Their Significance

The research by Mohammad AL-Smadi is a significant milestone in the field of machine learning. It provides a robust solution for detecting machine-generated academic essays, a problem that has been growing with the proliferation of AI technologies. The models developed in this study, ELECTRA and AraELECTRA, achieved remarkable results, with an F1-score of 99.7% for English and 98.4% for Arabic. This high level of accuracy is a testament to the effectiveness of these models and their potential applications in academia and beyond.

Pseudocode: Evaluate model performance using the F1-score.

from sklearn.metrics import f1_score

# Ground truth and predictions
true_labels = [1, 0, 1, 1, 0]
predictions = [1, 0, 1, 0, 0]

# Calculate F1-score
score = f1_score(true_labels, predictions)
print("F1-Score:", score)

‍

Broader Implications of the Research

The implications of this research are far-reaching. The ability to accurately detect machine-generated academic essays can have profound effects on academia, where the integrity of scholarly work is paramount. It can also impact other fields such as journalism and content creation, where the authenticity of written content is crucial. However, the research also presents challenges, such as the need for continuous model training and updates to keep up with the evolving capabilities of AI technologies.

Pseudocode: Apply the model across different content types.

# List of document types to evaluate
document_types = ['academic', 'journalism', 'blog']

# Apply model detection across document types
for doc_type in document_types:
    essay = load_essay(f'{doc_type}_essay.txt')
    prediction = english_model.predict(preprocess(essay))
    print(f"{doc_type.capitalize()} Essay Detection: {prediction}")

‍

Technical Analysis of the Models

The models developed in this study, ELECTRA and AraELECTRA, are based on transformer architectures, a type of model architecture that uses self-attention mechanisms to better understand the context of words in a sentence. The models underwent a preprocessing phase, where stylometric features were extracted and used to enhance their performance. Additional layers such as Dropout Layer, Batch Normalization, Fully Connected Layers, Rectified Linear Unit activation Function, and an Output Layer were included to further improve the models' capabilities.

Pseudocode: Model structure integrating stylometric features.

import torch.nn as nn

# Define model architecture
class StylometryEnhancedModel(nn.Module):
    def __init__(self, base_model):
        super(StylometryEnhancedModel, self).__init__()
        self.base_model = base_model
        self.fc1 = nn.Linear(768 + 10, 256)  # Combine model output with stylometric features
        self.dropout = nn.Dropout(0.3)
        self.output = nn.Linear(256, 1)

    def forward(self, text_features, stylometric_features):
        base_output = self.base_model(text_features)
        combined_features = torch.cat((base_output, stylometric_features), dim=1)
        x = self.dropout(nn.ReLU()(self.fc1(combined_features)))
        return torch.sigmoid(self.output(x))

‍

Practical Guidance on Applying the Technology

To apply these models in your own projects, you would need to follow several steps. First, you would need to preprocess your data to extract the necessary stylometric features. Then, you would need to train the models using this data, adjusting the parameters as necessary to achieve the best performance. Finally, you would need to evaluate the models using a benchmark dataset to ensure their accuracy.

Pseudocode: Full pipeline from preprocessing to model evaluation.

# Preprocessing and training pipeline
def train_model(model, dataset):
    for epoch in range(epochs):
        for text, label in dataset:
            features = extract_stylometric_features(text)
            processed_text = preprocess(text)
            model.train_step(processed_text, features, label)
    print("Training complete.")

# Evaluate model
def evaluate_model(model, test_data):
    predictions, labels = [], []
    for text, label in test_data:
        features = extract_stylometric_features(text)
        processed_text = preprocess(text)
        prediction = model.predict(processed_text, features)
        predictions.append(prediction)
        labels.append(label)
    return f1_score(labels, predictions)

# Training and evaluation
train_model(english_model, training_data)
score = evaluate_model(english_model, test_data)
print("Model F1-Score on Test Data:", score)

‍

Key Takeaways and Call to Action

The research by Mohammad AL-Smadi presents a robust solution for detecting machine-generated academic essays, a problem that has been growing with the proliferation of AI technologies. The models developed in this study, ELECTRA and AraELECTRA, achieved remarkable results, demonstrating the potential of transformer-based models in this field. We encourage you to explore these models further and consider how they can be applied in your own projects.

FAQ

Q1: What are transformer-based models?

A1: Transformer-based models are a type of model architecture in machine learning that uses self-attention mechanisms to better understand the context of words in a sentence.

Q2: What is a preprocessing phase in machine learning?

A2: The preprocessing phase in machine learning involves preparing the data for the model. This can include tasks such as cleaning the data, normalizing it, and extracting features.

Q3: What are stylometric features?

A3: Stylometric features are characteristics of a text that can be used to identify the author's style. This can include things like word length, sentence length, and the use of certain words or phrases.

Q4: How can these models be applied in other fields?

A4: These models can be applied in any field that requires the detection of machine-generated text. This can include fields like journalism, content creation, and academia.

Q5: What are the challenges in applying these models?

A5: Some of the challenges in applying these models include the need for continuous model training and updates to keep up with the evolving capabilities of AI technologies.

Q6: What is the significance of the F1-score in this research?

A6: The F1-score is a measure of a model's accuracy. The high F1-scores achieved by the models in this study demonstrate their effectiveness in detecting machine-generated academic essays.

‍