Hi, I’m Zeyu, a PhD student at ETH Zurich. I’m luckily to be advised by Prof. April Wang and Prof. Dennis Komm. My research interests include Human-Computer Interaction, Human-AI Collaboration, Assistive Technology, Educational Technology, Ubiquitous Computing, and Computer Music. Besides research and programming, I enjoy basketball and rock music.
Download my CV.
PhD in Computer Science, 2028 (expected)
ETH Zurich
MPhil in Computational Media and Arts, 2024
The Hong Kong University of Science and Technology
BSc in Computer Science with Artificial Intelligence, 2022
University of Nottingham, Ningbo China
BSc in Computer Science with Artificial Intelligence, 2022
University of Nottingham
The PEACH LAB is at ETH Zurich within the Department of Computer Science (D-INFK). The group is led by Prof. April Wang.
My responsibilities include:
The APEX research group is at The Hong Kong University of Science and Technology (Guangzhou) within the Computational Media and Arts Thrust and The Hong Kong University of Science and Technology within the Division of Integrative Systems and Design. The group is led by Prof. Mingming Fan.
My responsibilities include:
A Start-Up company focusing on Beauty Makeup Services Affiliated with the Ningbo Intelligent Technology Research Institute, which is founded by Ningbo government and prof. Zexiang Li at HKUST.
My responsibilities include:
A Start-Up company focusing on building older adult’s life sharing & livestreaming platform. Team lead by Mr. Shenshen Li (Co-founder of Zhihu.com) and Mr. Qiangning Hong (formal Chief Architect of Douban.com).
My responsibilities include:
An undergraduate on-campus research society with a particular focus on digital computing technology.
My responsibilities include:
A Start-Up company focusing on students’ academic and life service.
My responsibilities include:
Parcel lockers have become an increasingly prevalent last-mile delivery method. Yet, a recent study revealed its accessibility challenges to blind and low-vision people (BLV). Informed by the study, we designed FetchAid, a standalone intelligent mobile app assisting BLV in using a parcel locker in real-time by integrating computer vision and augmented reality (AR) technologies. FetchAid first uses a deep network to detect the user’s fingertip and relevant buttons on the touch screen of the parcel locker to guide the user to reveal and scan the QR code to open the target compartment door and then guide the user to reach the door safely with AR-based context-aware audio feedback. Moreover, FetchAid provides an error-recovery mechanism and real-time feedback to keep the user on track. We show that FetchAid substantially improved task accomplishment and efficiency, and reduced frustration and overall effort in a study with 12 BLV participants, regardless of their vision conditions and previous experience.
Chinese traditional opera (Xiqu) performers often experience skin problems due to the long-term use of heavy-metal-laden face paints. To explore the current skincare challenges encountered by Xiqu performers, we conducted an online survey (N=136) and semi-structured interviews (N=15) as a formative study. We found that incomplete makeup removal is the leading cause of human-induced skin problems, especially the difficulty in removing eye makeup. Therefore, we proposed EyeVis, a prototype that can visualize the residual eye makeup and record the time make-up was worn by Xiqu performers. We conducted a 7-day deployment study (N=12) to evaluate \textit{EyeVis}. Results indicate that EyeVis helps to increase Xiqu performers' awareness about removing makeup, as well as boosting their confidence and security in skincare. Overall, this work also provides implications for studying the work of people who wear makeup on a daily basis, and helps to promote and preserve the intangible cultural heritage of practitioners.
Chinese Traditional Opera (Xiqu) is an important type of intangible cultural heritage and one key characteristic of Xiqu is its visual effects on face achieved via makeup. However, Xiqu makeup process, especially the eye-area makeup process, is complex and time-consuming, which poses a learning challenge for potential younger inheritors. We introduce OperARtistry, an interactive application based on Augmented Reality (AR) that offers in-situ Xiqu makeup guidance for beginners. Our application provides a step-by-step guide for Xiqu eye-area makeup, incorporating AR effects at each stage. Furthermore, we conducted an initial user study (n=6) to compare our approach with existing video-based tutorials to assess the effectiveness and usefulness of our approach. Our findings show that OperARtisty helped participants achieve high-quality eye-area makeup effects with less learning time.
Stuttering is a speech disorder influencing over 70 million people worldwide, including 13 million in China. It causes low self-esteem among other detrimental effects on people who stutter (PwS). Although prior work has explored approaches to assist PwS, they primarily focused on western contexts. In our formative study, we found unique practices and challenges among Chinese PwS. We then iteratively designed an online tool, CoPracTter, to support Chinese PwS practicing speaking fluency with 1) targeted stress-inducing practice scenarios, 2) real-time speech indicators, and 3) personalized timely feedback from the community. We further conducted a seven-day deployment study (N=11) to understand how participants utilized these key features. Results indicate that personalized practice with targeted scenarios and timely feedback from a supportive community, which was appreciated more than quantitative indicators, assisted PwS in speaking fluently, staying positive, and facing similar real-life circumstances. Finally, we present design implications for better supporting PwS.
We propose a method for generating music from a given image through three stages of translation, from image to caption, caption to lyrics, and lyrics to instrumental music, which forms the content to be combined with a given style. We train our proposed model, which we call BGT (BLIP-GPT2-TeleMelody), on two open-source datasets, one containing over 200,000 labeled images, and another containing more than 175,000 MIDI music files. In contrast with pixel level translation, our system retains the semantics of the input image. We verify our claim through a user study in which participants were asked to match input images with generated music without access to the intermediate caption and lyrics. The results show that, while the matching rate among participants with little music expertise is essentially random, the rate among those with composition experience is significantly high, which strongly indicates that some semantic content of the input image is retained in the generated music. The source code is avaliable at https://github.com/BILLXZY1215/BGT-G2G.