Skip to main content

Research Experience for Undergraduates (REU) Research Projects

Research Experience for Undergraduates (REU) Research Projects


The proposed research projects are subject to change, contingent on new mentors and/or the research interests of the students.

Research Together


Genotype error correction using deep learning sequence models

High-throughput sequencing technology combined with methods such as restriction site-associated DNA sequencing (RAD-Seq) (Miller et al., 2007) and Genotyping-by-Sequencing (GBS) (Elshire et al., 2011) is used to identify and score large numbers of genome-wide genetic markers like SNP variants, to comprehend segregation in genetic mapping studies. The output of such technology is a genotype file that contains the genotype call on the genetic markers in different individuals from the population used in the study. Subsequent analysis, such as QTL (Quantitative Trait Locus) analysis, can be performed using this genotype file.
A challenge to such studies is the presence of missing data and incorrect genotype calls in some of the genetic markers. This may arise because of low sequencing coverage and/or an imperfection in the software tool used to generate the genotype file. Missing data are cases where no genotype call is made for a given SNP or marker site in an individual. An incorrect genotype call is a site where the genotype call is different from the true genotype label. Both these issues have a significant impact on constructing accurate genetic mapping. Various studies have been devoted to solving the missing data problem using statistical approaches (Browning and Browning, 2007; Howie et al., 2009). Variants of the sliding-window-based approach (Huang et al., 2009) and Hidden Markov Models have been adopted to correct the genotype error calls in high-throughput genotype data. However, such approaches do not handle long-range dependencies that exist in genotype data. We propose using a deep learning neural network model called Long Short-Term Memory autoencoder (LSTM) to detect and correct errors in genotype data. LSTM is a neural network model that captures long-range dependencies in data which are in the form of a sequence. An autoencoder is also a neural network model that copies its input to its output. LSTM autoencoders leverage both LSTM and autoencoder properties to detect errors and correct them.


Prerequisites:

  • Basic programming skills (Python preferred)

 

Learning Outcomes:

Undergraduate researchers will:

  • Learn and apply different network analysis algorithms.
  • Learn how to use different network analysis software tools.
  • Exposed to applications of graph theory in biology.

Mentor:

  • Benjamin Soibam - Associate Professor, Department of Computer Science and Engineering Technology

Privacy-Preserving Machine Learning Models

Machine learning models are increasingly used to extract insights from sensitive datasets such as health records, financial transactions, or user activity logs. However, directly training models on such data can lead to privacy breaches through inference or reconstruction attacks. Differential Privacy (DP) provides a mathematically rigorous framework to protect individual data by introducing controlled noise during data processing or model training. In particular, Local Differential Privacy (LDP) enables each user to perturb their data before contributing it to the model, ensuring that even an untrusted data collector cannot infer sensitive information about individuals. This makes LDP a powerful approach for building privacy-preserving learning systems in decentralized environments.

This project aims to design and evaluate machine learning algorithms that preserve privacy while maintaining utility. Students will implement basic models such as Naïve Bayes or linear regression under both centralized and local differential privacy settings. They will study the tradeoff between model accuracy and privacy strength (quantified by the privacy budget ε), and compare how different privacy mechanisms affect learning outcomes. The project will also involve analyzing potential privacy leakage scenarios and demonstrating how privacy-preserving models mitigate such risks. Through this project, students will gain both theoretical understanding of differential privacy and practical experience in developing and testing secure machine learning models.


Prerequisites:

    • Basic programming skills (Python preferred)

 

Learning Outcomes:

Undergraduate researchers will:

  • Understand the principles of Differential Privacy (DP) and Local Differential Privacy (LDP), including how noise addition protects individual data contributions in machine learning models.
  • Gain hands-on experience in implementing and training simple machine learning models (e.g., Naïve Bayes, linear regression) with privacy-preserving mechanisms using Python libraries.
  • Quantitatively assess the impact of privacy parameters (e.g., privacy budget ε) on model accuracy and data utility, and interpret the tradeoffs between privacy protection and predictive performance.

Mentors:

  • Emre Yilmaz - Assistant Professor, Department of Computer Science and Engineering Technology

Control Loop Performance Monitoring

Control valves play a crucial role in the performance and stability of industrial control loops. However, issues such as valve wear, stiction, hysteresis, improper sizing, or malfunction can significantly degrade control loop performance, leading to inefficiencies, instability, and increased maintenance costs. In addition to traditional methods for diagnosing valve-related issues (that are often labor-intensive, relying on manual inspection and process knowledge), statistical based methods have been proposed over the years.

However, in recent years, machine learning (ML) techniques have emerged as a powerful tool for classification problems. This research aims to use machine learning methods to develop methods to identify issues with control loop performance that are due to valve stiction, hysteresis, and process nonlinearity. It will use large amounts of process data from a control loop (setpoint, process variable, controller output, valve position) to develop models capable of detecting the previously mentioned control valve issues. Case studies from manufacturing industries will be used to demonstrate the effectiveness of methods developed.


Prerequisites:

    • Basic programming skills (Python and MATLAB preferred)

 

Learning Outcomes:

Undergraduate researchers will:

  • Develop research skills by working in an interdisciplinary environment with engineering and computer science professionals.
  • Be exposed to process dynamics and control concepts
  • Learn how to identify and quantify issues with control systems reliability and performance
  • Cross train in computational sciences and engineering

Mentors:

  • Vassilios Tzouanas – REU PI, Professor and Chair, Department of Computer Science and Engineering Technology

Understanding Student Behavior While Using Generative AI for Coding

Generative AI tools, such as ChatGPT, have become increasingly popular among students for educational purposes. A recent study reported that approximately 40% of college students have used ChatGPT as part of their higher education experience. Generative AI can be highly beneficial in learning difficult concepts. However, it also poses risks of misuse, such as completing assignments such as essays and writing computer programs. This is particularly concerning in Computer Science, where Generative AI can produce complex code in response to user prompts. This research aims to examine students' physiological responses as they engage with Generative AI for writing computer programs.

 

In the first phase, the research will focus on designing a study to collect data in a laboratory setting. Study participants will be asked to generate Python programs of varying complexity, both with and without the assistance of Generative AI. Throughout the study, physiological responses - such as eye-gaze movement, body motion, and facial expressions - will be recorded. In the next phase of the research, machine learning algorithms will then be developed to identify physiological patterns associated with the use of Generative AI tools.

 

Through this project, students will gain exposure to computer science research, including data analysis, algorithm development, model evaluation, and results presentation. Additionally, they will develop soft skills such as teamwork, problem-solving, and adaptability to different approaches.


Prerequisites:

  • Basic programming skills (Python preferred)
  • Having a background in machine learning is preferred.

Learning Outcomes:

Undergraduate researchers will:

  • Be exposed to the human-centered computing research
  • Learn how to design and conduct experiments and analyze the collected data
  • Cross-train in computational sciences and experimental methods
  • Enhance soft skills, such as working in a team, problem-solving skills, and adaptability to different problem-solving approaches.

Mentors:

  • Dvijesh Shastri – Professor and Assistant Chair, Department of Computer Science and Engineering Technology

Forensic Engineering

A comprehensive plan for conducting a forensic inspection of property damage attributed to global warming. The primary objectives are to assess the extent of the damage, identify specific climate-related causes, and provide actionable recommendations for remediation and future resilience strategies. Our approach involves a thorough on-site assessment using advanced technologies, analysis of relevant climate data, and interviews with stakeholders to gather qualitative insights. Ultimately, it aims to deliver a detailed report with the application of numerical modeling, thermal assessment, and root cause analysis in this research to classify the types of damage resulting from material degradation, construction defects, global warming, and design deficiencies. The assessment aims to identify the types and extent of damage, underlying causes, and potential mitigation strategies.

This systematic approach allows for a thorough understanding of how these factors contribute to structural vulnerabilities and weak points. By employing advanced modeling techniques and thermal imaging, we can identify specific issues, such as insulation failures and material fatigue, exacerbated by climate change.


Prerequisites:

  •  Basic programming skills (Microsoft Office, high school physics), writing skills

Learning Outcomes:

Undergraduate researchers will:

  • Develop research skills by challenging themselves to find the cause of damages or failures.
  • Be exposed to damage measurement technology.
  • Learn how to assess and write a technical report.
  • Be multi-trained in mathematics, physics, and material science.

Mentors:

  • Arash Rahmatian – Associate Professor, Department of Computer Science and Engineering Technology


Utilizing Interpolation Functions to solve Hyperbolic Systems

A shock tube is a device to understand shock waves and other examples of gas dynamics that are waves. These waves can be expressed mathematically by Euler’s equation, which is an example of a hyperbolic system. Hyperbolic systems are a challenge because there are nonlinearities in their boundary conditions and these systems can be solved by various means to include interpolation functions.

I’m working on a book, Fundamentals of Shock Waves, and this work supports the book and also an experimental apparatus I hope to design, build and use for experimentation. This work is also important to further the understanding of gas dynamics around Euler’s equation and the use of interpolation functions in solving differential equations.




Prerequisites:

  • Basic programming skills (Python preferred)

 

Learning Outcomes:

Undergraduate researchers will:

  • Provide a foundation in solving numerical differential equations
  • Explore various interpolation functions and their applications
  • Explore Shock Tubes and Euler’s Equation plus other hyperbolic systems
  • Develop computer code to solve a Shock Tube problem
  • Discuss Sod’s test cases and other test cases
  • Develop a novel interpolation function scheme to solve a given shock tube problem

Mentors:

  • Henry Foust – Assistant Professor, Department of Computer Science and Engineering Technology




Detection and Simulation of Chemical spills of Hazardous Chemicals using AI technology

Chemical spills are a common problems that industries face. Due to strategic location, Houston and its nearby cities are facing this challenge more than ever due to growing energy demands. Through this research, we want to explore different parameters of spreading of chemicals during spills by using computer simulations. In particular, we aim to explore hydrogen sulfide (H2S) chemical spill which is a major problem in Houston as well as cities which deals with petroleums. The research will provide a deeper understanding of the chemical spills to the REU students as they will go through the learning of computer simulation. The research findings may help mitigation techniques and environmental parameters of dealing with chemical spills.


Prerequisites:
Fluid mechanics, thermodynamics, and an introductory computer science course

Learning Outcomes:

  • What challenges safety professionals faced on dealing with chemical spills?
  • What works have been conducted in terms of addressing these challenges?
  • What were the financial burdens to address these challenges?
  • What numerical techniques researchers used previously?
  • What are the environmental parameters and list them to study their ranges?
  • Giving alternative solutions to current existing solutions.

Mentors:

  • Mahmud Hasan – Assistant Professor, Department of Computer Science and Engineering Technology



 


Automatic Generation of Medical Image Reports using Vision-Language Models

This project focuses on using Vision-Language Models (VLMs) to automatically generate medical image reports, addressing the growing demand for faster and more consistent diagnostic interpretations. Three state-of-the-art models - LLaMA 3.2 Vision, CLIP, and Florence-2 - will be fine-tuned on the medical dataset using Low-Rank Adaptation (LoRA) and Parameter-Efficient Fine-Tuning (PEFT) methods. 

Each model learns to transform chest radiographs into structured, clinically relevant reports, evaluated through BLEU and ROUGE metrics for accuracy and reliability. The study aims to identify which model most effectively balances precision, interpretability, and efficiency - supporting radiologists in delivering timely, high-quality diagnostics while reducing workload variability.


Prerequisites:
Basic programming skills (Python preferred)

Learning Outcomes:

Undergraduate researchers will:

  • Learn and apply different Vision-Language Models for medical imaging.
  • Quantitatively assess the efficiency of VLMs.

Mentors:

  • Subash Pakhrin – Assistant Professor, Department of Computer Science and Engineering Technology



 


TLE-SafeguardNet Project

The TLE-SafeguardNet project will introduce a privacy-enhancing framework designed to protect Space Situational Awareness (SSA) data from adversarial manipulation and inference attacks. Since Two-Line Element (TLE) data will continue to play a vital role in tracking satellites and managing orbital debris, ensuring its integrity will be critical to prevent collisions and maintain space safety. To address the growing cybersecurity risks in satellite operations, the proposed framework will integrate Singular Value Decomposition (SVD) with a forward diffusion noising process to obscure sensitive orbital features while retaining their usefulness for downstream machine learning and deep learning tasks. This approach will allow sensitive information to be masked without degrading the precision and reliability of SSA applications.

The framework will be rigorously tested on Space-Track.org’s 2023 Q4 and 2024 Q1 datasets using algorithms such as Random Forest, XGBoost, 1D-CNN, LSTM, and Tabular Transformers. It is expected that Random Forest will achieve the highest accuracy and stability, maintaining strong classification performance even after 401 diffusion steps, as validated by high Peak Signal-to-Noise Ratio (PSNR) values. Visualization through t-SNE and SHAP will likely reveal that features such as Eccentricity, Inclination, and Drag Term (B*) are the most influential in classifying payloads and debris. Overall, TLE-SafeguardNet will aim to demonstrate a robust balance between privacy protection and operational performance, offering a deployable and scalable solution for safeguarding orbital data within SSA pipelines.


Prerequisites:
Basic programming skills (Python preferred)

Learning Outcomes:

Undergraduate researchers will:

  • Learn about machine learning and deep learning techniques.
  • Apply such techniques to protect Space Situational Awareness (SSA) data.

Mentors:

  • Subash Pakhrin – Assistant Professor, Department of Computer Science and Engineering Technology