Hanna ZENG
Hanna ZENG

Hanna ZENG

Welcome! Bienvenue! 欢迎
* This web framework version was built back in 2022 when I was a junior.
“The life you live will expand or shrink in proportion to the courage you display.”
 
My preferred first name is Hanna, and I am a graduate student in the SM-60 program at Harvard. Prior to attending Harvard HDS program with full scholarship, I completed my undergraduate studies at the Faculty of Arts and Science, University of Toronto St. George. During my time there, I pursued a double major in Computer Science & Statistical Science, with a minor in Mathematics. I graduated in June 2023 with an honors bachelor's degree from three departments, receiving high distinction. Before joining the University of Toronto, I underwent over 1.5 years of non-degree training at Beijing Normal University-HKBU UIC. I am immensely grateful for the guidance and support from my mentors and advisors who have played a crucial role in helping me achieve my goals.
My passion lies in exploring data-driven interdisciplinary topics, and I have previously collaborated on projects related to AI and machine learning with applications in healthcare, education, finance, and social economics. Currently, I am actively seeking for 2024 summer industry interns - Data Engineering/Software Development/Industry Researcher and 2024Dec NG Data/Cloud Engineer full-time opportunities. Having dedicated a significant amount of time to research, I'm truly enthusiastic about acquiring hands-on experience and discovering my capabilities within industry roles.
I have a wide range of interests and unlimited potential. I believe in continuous learning and growing with the advancements of the era.
If you have any open positions still in hiring, reach out to me right now! I am happy to know more about the qualifications of applicants that you are looking for and discuss any chances to work. I am eager to contribute my skills and knowledge to meaningful projects in the field. 🙂
 

📚 Education

Master of Science. (in progress, expected to graduate in 2024 Dec or 2025 May, flex. Current Plan: graduate in 2024 Dec) Harvard University
During my time at Harvard, I serve as Academic Program Representative and International Students Advocate in Harvard Chan Student Association.
Cross registration EECS (in progress, expected to graduate in 2024 Dec or 2025 May, flex. Current Plan: graduate in 2024 Dec) MIT
Honour’s BSc. in Computer Science major (degree conferred 📅 2023 June) University of Toronto
double majoring in Statistics, minor in Mathematics (degree conferred 📅 2023 June) University of Toronto
📑 Selected Relevant Coursework
*Graduate Level
CS STAT MATH plus domain knowledge ECO/FIN/BIO/CGS/PHL … etc (Click to see more)
CS
Stat
Math
Eco & Finance & other social science
Quant Business
other Sciences
Computer organization
Probability & Statistics I&II
Linear Algebra I&II
Mirco economics
Accounting
Nutrition Science
*Computer graphics
*Advanced probability (pending)
Multivariate Calculus I&II (several variables&vector cal)
Macro economics
Business/marketing intro
Philosophy, Science ethics
AI & ML (Oxford)
*Time series analysis (pending)
Ordinary differential equation
Political science quant empirical research
Business data analytics
*Neural networks and deep learning
*Statistical machine learning I&II
Combinatorics
Financial planning and investment analysis
Theory of Computation
*Methods of data analytics
Elements of Analysis(real+complex) (pending)
Data structure and algorithms
*Categorical data analysis
Complex Analysis
Cognitive science of AI (Yale)
Regression analysis
Symbol logics
Data visualization
*Image understanding (pending)
*Bayesian statistics
Computational media (HCI)
Database systems
Object-oriented programming
Structured programming
Discrete mathematics
Software design
 
Note: ) Courses finished or ongoing at Harvard & MIT haven’t been added to the above list yet
Coursework page
grade

 

💻 Labs

When it comes to academic fields, my most recent AI practice lies in information retrieval, data mining, recommendation, NLP. ( prior experience with applications in health-tech/edu)
I cross registered NLP course in 2023 Fall at MIT, a great course taught by Prof. Yoon Kim.
Deep Learning in Optimal Individualized Omni-channel Treatment Decision Rule from a Multi-label Classification Perspective Code
| Python |  QuLab (Research Exchange) Individual Study July 2022 - Present
➢ Designed a Multi-Label Residual Weighted Learning (MLRWL) framework with a novel ITR estimation method for combination treatments incorporating interaction effects among treatments with applications in precision medicine
➢ Co-authored paper the Electronic Journal of Statistics (EJS) Vol. 18 (2024) 1517–1548
[PDF] Multi-Label Residual Weighted Learning for Individualized Combination Treatment Rule | Semantic Scholar
This paper introduces a novel ITR estimation method for combination treatments incorporating interaction effects among treatments, proposing the generalized $\psi$-loss as a non-convex surrogate in the residual weighted learning framework, offering desirable statistical and computational properties. Individualized treatment rules (ITRs) have been widely applied in many fields such as precision medicine and personalized marketing. Beyond the extensive studies on ITR for binary or multiple treatments, there is considerable interest in applying combination treatments. This paper introduces a novel ITR estimation method for combination treatments incorporating interaction effects among treatments. Specifically, we propose the generalized $\psi$-loss as a non-convex surrogate in the residual weighted learning framework, offering desirable statistical and computational properties. Statistically, the minimizer of the proposed surrogate loss is Fisher-consistent with the optimal decision rules, incorporating interaction effects at any intensity level - a significant improvement over existing methods. Computationally, the proposed method applies the difference-of-convex algorithm for efficient computation. Through simulation studies and real-world data applications, we demonstrate the superior performance of the proposed method in recommending combination treatments.
[PDF] Multi-Label Residual Weighted Learning for Individualized Combination Treatment Rule | Semantic Scholar
➢ Aimed at figuring out an optimal Individualized Omni-channel Treatment Rule in precision medicine situations
Deep learning-based Recommender Systems Code
| TensorFlow/PyTorch | Data Intelligence Lab Research Assistant May 2022 Supervisor: Chao Huang, Department of Computer Science, Hong Kong University Data Intelligence Lab ➢ Re-implemented Neural Graph Collaborative Filtering Model and Hypergraph Contrastive Collaborative Filtering ➢ Used in recommended scenarios where collaboration signals hidden in the user-item interaction cannot be ignored and need comprehensive capture of complex higher-order dependencies among users in the embedding process
Interplay of factors associated with risk in the pre-disease stage of Crohn's disease Code
| Python | Data Sciences Institute of University of Toronto| Research Scholar March 2022 Supervisor: Kenneth Croitoru, Clinician-Scientist and Professor, Luenefeld Tannenbaum Research Institute, Mount Sinai ➢ Implemented Logistic Regression and Casual Inference tools to predict the risks of Crohn’s Disease via Bile Acid profiles ➢ Helped clinical disease screenings and risk detection in patients’ early stage
AI-Based Language Chatbot2.0 Code
| Python/SQL/WXML | contribution acknowledged in publication Emerging Technologies for Education Dec 2020 – Jul 2021 Supervisor: Raymond Lee, Hong Kong Quantum Finance Forecast Center ➢ Designed a database (Mongo DB for Tencent Cloud) and used AI technologies automatic options generation and speed recognition for the development of an English Language Concept Learning Agent App ➢ Combined English concept learning, assisted and conducted English learning by providing various exercises such as multiple-choice and phonetic questions( WordNet, Word2vec, BERT)

⌨ Projects

Python Package Development | Python | Code (Group project)
Pediatric Sleep Patterns Detection from Wrist Activity Using Random Forests | Python+R | Code +Web(Self-guided) Affiliation: Department of Biostatistics, Harvard University - T.H Chan School of Public Health
➢ Trained a Random Forest sleep pattern detection model using wrist-worn accelerometer data with accuracy, recall, specificity above 98% and out-of-bag error rate 0.0173 ➢ Identified top3 most influential features hour of the day, enmo, and anglez, indicating the significance of movement intensity, orientation, and time in determining sleep states
Multi-source Transfer Learning Models: NST and Conditional CycleGAN | Python | Paper (co-authored with my UofT collaborators) Affiliation: Department of Computer Science, University of Toronto
Semantic Segmentation in Autonomous Driving | PyTorch | Code (Self-guided) Advisor: Zheng Wu, Department of Mechanical Engineering, University of California, Berkeley
➢ Labeled the pixels of a road in cityscape images using Deeplab and Fully Convolutional Network (FCN) models ➢ Applied Vgg as pre-trained models and achieved mIoU 50%
Gratitude to Strangers | Figma | Designer of interactive computational media | Demo Advisor: Fanny Chevalier, Department of Computer Science, University of Toronto
➢ Designed User Interface for our app Samaritan whose mission is to realize real-time anonymous feedback to kindness ➢ Determined the effective sample size and conducted user research with k-means clustering for target segmentation ➢ Drew paper sketches for the user interface and designed interactive prototypes using Figma
Charity Online System Platform Construction | SQL/Python+Django | Demo Code Advisor: Changjiang Zhang, Department of Data Science, BNU-Hong Kong Baptist University UIC
➢ Leveraged Python to crawl and collect charity information from official charity websites ➢ Cleaned unstructured data and conducted exploratory data analysis, stored via MySQL ➢ Used Django as a framework to connect the front-end interface and back-end database ➢ Supplemented by visualizations such as customized donation maps
Productivity Calendar App | Java, Shell | Code
➢ An application designed for scheduling events/tasks and reminding users of then, which can be used both personally or by a company: Tasks/events can last for a timeframe or occur at a single-time, be auto-generated every day/week/month, be given descriptions, be visually displayed as a checklist or on a calendar, and be given thematic labels for filtering. Tasks can be divided into subtasks, given progress labels, and have comments added. Furthermore, the history of a task can be tracked, and reminders can be auto-generated about upcoming tasks. Users can additionally share their calendars with other users, who can access the calendar given a calendarID, and be given different permissions for that calendar.
Computer Graphics Practices | C++/OpenGL | Code Advisor: Karan Sher Singh, department of Computer Science University of Toronto
➢ Raster Images, Ray Castering, Ray Tracing, Bounding Volume Hierarchy, Meshes, Transformation, Shader Pipeline, Kinematics, Mass-Spring Systems
Education agents — who are my customers? | Python | Demo (Self-guided)
➢ Assumed working as an education agent who focuses on helping undergraduates apply for overseas postgraduate studies. Aimed to better understand the customers ➢ Collected data from the undergraduate students, including the demographical data and preferences concerning overseas postgraduate studies, and used the data to identify different segments of undergraduates ➢ Implemented k-means clustering tool for market segmentation and study of consumer behaviour ➢ Chose Davies-Bouldin index (DBI) as a measure for k-size, Mixed Euclidean Distance as a measure for nearest neighbours, figured out Top3 most influential factors
Construction and Analysis of Music Genre Maps | Python/MATLAB/R | Sample (Self-guided)
➢ Used clustering models to determine whether artists within a certain genre are more similar than artists across genres, processed genres’ high dimensional feature data into the PCA-CS/PCA-model; visualized similarities between two genres on a heat map, found that among all music features danceability and energy characteristics are most "contagious" and showed stronger correlations with popularity ➢ Used Box-Jenkins multivariate time series model, found that the value of average music features fluctuated greatly between the 1920s and 1960s and specifically in terms of two key fluctuating characteristics (danceability and valence) and analyzed the fluctuation alongside the emergence of blues and increasing popularity-related genres such as rock and jazz, deduced the key impact factors for followers, and constructed a graph showing the intertwined relationships of various musicians
Improved Multiple Regression Analysis and Prediction | R | Code
➢ Examined the house price dataset and selected the best-performing multivariate regression model to explain 86% of the data with appropriate transformation and outliers, deleted seven candidate models in two different methods, and achieved the goal of predicting house prices with the conclusion that buyers prefer houses near rivers, far from factories, with less nitrogen oxide emission and more rooms, and that houses with these criteria sell at a higher price
Monte Carlo Simulation - Maximum Likelihood Method | R | PreSlides
➢ Contribution: Background Knowledge/Data Analysis/Simulation: Analyzed admission status by applying the maximum likelihood estimation and the sample method, used Monte Carlo method to generate the distribution assumption based on the analytical results, and performed a distribution test and the comparison and found that the maximum likelihood method was more reliable when calculating the estimators than the sample method, whereas the estimators based on the large sample were more reliable than those based on the small sample ➢ Concluded that universities show signs of preference in admitting students who have an undergraduate CGPA around 86% and that admission officers value a student's comprehensive ability rather than solely focusing on CGPA statistics
Factors Impact Student Performance on Course Assessment Under Pandemic | R | Code
➢ Figured out most fitted MLR model with interaction factors, concluded COVID is not a key factor whereas weekly studying time and its interaction effect with office hour attendence is
Causal Relationship Between Countries GDP per Capita and Education Index Value | R/Python | Draft Acknowledgement to UofT Politicial Science department
➢ Data has been collected from Human Development Index (HDI) reports by the United Nations Development Program (UNDP) from year 1990 to 2017 ➢ Applied Causal theory to study the relationship between countries’ GDP per Capita and its determinant, namely, education index value, and four other control variables: median age of countries, inequality-adjusted education index, unemployment rate, and the gender development index
Toronto Movie Released dates vary on the Schedule of Holidays | R/Python | Visualization Code
Toronto Bicycle Theft | R/Python | Data Storytelling & Visualization Code
Differences in Education Systems Worldwide | Tableau/Html5/CSS/Javascript | Weblog

🎮 Games For Fun:

Goldminer Demo, Rotating Rose Demo | MATLAB
Breakout (Assembly) Report DemoComingSoon

💼 Work Exploring Experiences

Apple  | Database/Data Crawling/Large Language Model(LLM) | AI/ML intern  2024 (Offer Declined Due to personal plan& interests change, (also pay reasons) (Passed the background check process which was required for this unconditional offer)
notion image
notion image
notion image
Harvard Library Digital Reserves Team Specialist | Adobe/LaTex/Digitizing | Support Digital Accessibility  2024
➢ Digital Reserves Team - Harvard Widener Library. Manage digitizing and scanning of materials in support of teaching and learning at Harvard University each semester. Create accessible pdfs, track digitizing requests, process requests, maintain the Digital Reserves collection to better service Harvard community. Re-type complex math equations, transcribe intricate mathematical equations into digital formats, ensuring their accuracy and readability for academic use
Rotman School of Management | Python/AWS/SQL/ETL | Data Role Work Study Paid  2022-2023
➢ Used AWS to help data pipelines and data processing workflows for large TBs retail financial data, in support of car buyers’ behavior analysis. Collaborated on ETL tasks, as well as the creation of data visualization tools. Worked on statistical modeling to uncover insights into car buyers’ behaviors based on demographic factors, such as gender and car configuration preferences (private repo)
Hongzheng Information Technology Co., Ltd | SPSS/R/Python/Axure | Product Intern 2020 Summer (Know myself not a great fit for product role in the very beginning haha, but anyway a good trial! Later I switched my interests to Data, AI Engineer/Scientist )
➢ Designed product strategies based on analysis of market demand, communicated with the technical department and commercial companies, presented visual charts and series of Axure-based prototype diagrams and product research documents

🏫 Professional Activities

  • Publications:
Qi Xu. Xiaoke Cao. Geping Chen. Hanqi Zeng. Haoda Fu. Annie Qu. "Multi-label residual weighted learning for individualized combination treatment rule." Electron. J. Statist. 18 (1) 1517 - 1548, 2024. https://doi.org/10.1214/24-EJS2236
  • Conference Reviewer:
International Conference on Artificial Intelligence and Statistics, 2023
(AISTATS,2023)
International Conference on Artificial Intelligence and Statistics, 2024
(AISTATS,2024)

🗣 Languages

Mandarin (native)
English (fluent) IELTS✔️GRE✔️
French(beginner)

🏆 Honours and Awards

  • First Class Scholarship 2020
  • President’s Honour Roll for 2 academic years
  • Dean’s List Scholar for 2 academic years
  • Harvard Central Grant & Stipend (full tuition & stipend admission)
  • Victor and William Fung Fellowship
  • Dr. Theodore Montgomery Scholarship Fund
  • Load Relief Award

Arts🎨 Music🎸 & STEM🤖️

Integrating Arts, Music, and STEM
💡
Mentorship DCS Women Mentorship@Utoronto
💡
Elevate Festival 2022
💡
🎼 Music Time & My 📷 Photography Gallery

📝 Other

2020 Yale Summer Session (Cognitive Neuroscience, Ethics of AI) Dr. Joanna Lawson, Department of Philosophy
2021 Mathematical/Interdisciplinary Contest in Modelling
2021 Spring Huawei Kunpeng Training Program
2021 Summer Huawei Seeds for The Future in Canada
2022 Oxford Study Abroad (AI and Machine Learning) Dr. Rob Collins and Dr. Nigel Mehdi, CS Department Individual Assignment Project: astrophysics research Code
UofT Computer Science Student Union (CSSU) member; Women in Computer Science community(WiCS) ; 3M Running Club member
Canadian Open Mathematics Challenge (COMC) grader
Leadership Portfolio (partial, mostly freshman year)
Cocurricular Transcript download by clicking here

Contact ☎️ Me
Check my LinkedIn 👔  / GitHub ⌨️  by clicking the floating icons on this web
Longwood Medical Area, MA Boston, 02115, USA / Harvard Square, MA Cambridge, 02138, USA
 
* This web framework version was built back in 2022 when I was a junior.