Andrew (Ching-Yuan) Bai
Hello! Welcome to my personal website.
I am a 3rd year PhD student in Computer Science at UCLA, where I work with Cho-Jui Hsieh. I am interested in understanding why machine learning methods work and debunking machine learning myths. Specifically I aim to make black-box machine learning models more interpretable to allow better control.
Currently I am developing simple and easy-to-adopt sample selection schemes for prioritizing data samples during training (sample-based interpretability). I also worked on developing practical interpretation methods for black-box models, allowing humans to better understand and trust machine learning models in real life (concept-based interpretability).
Previously, I was an undergraduate student in Computer Science at National Taiwan University. I worked with Hsuan-Tien Lin on generative modeling and time series forecasting. We held the first-ever generative modeling competition in collaboration with Kaggle. I also worked with Chung-Wei Lin on system verification and falsification.
Email: andrewbai [AT] cs.ucla.edu
Links: [CV] [Github] [Linkedin]
Publications
2024
- Andrew Bai, Chih-Kuan Yeh, Cho-Jui Hsieh, Ankur Taly. Which Pretrain Samples to Review when Fine-tuning Pretrained Models? Under submission review.
- Tong Xie*, Haoyu Li*, Andrew Bai, Cho-Jui Hsieh. Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation. Under submission review.
[bib | arxiv]
2023
- Andrew Bai, Chih-Kuan Yeh, Pradeep Ravikumar, Neil Y. C. Lin, Cho-Jui Hsieh. Concept Gradient: Concept-based Interpretation Without Linear Assumption. In Proceedings of the 11th International Conference on Learning Representations (ICLR), May 2023.
[bib | arxiv | code]
2022
- Andrew Bai, Cho-Jui Hsieh, Wendy Chih-wen Kan, Hsuan-Tien Lin. Reducing Training Sample Memorization in GANs by Training with Memorization Rejection. Arxiv preprint.
[bib | arxiv | code]
2021
- Ching-Yuan Bai, Hsuan-Tien Lin, Colin Raffel, and Wendy Chih-wen Kan. On training sample memorization: Lessons from benchmarking generative modeling with a large-scale competition. In Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), August 2021.
[bib | data | arxiv | code ]
2020
- Shih-Lun Wu*, Ching-Yuan Bai*, Kai-Chieh Chang, Yi-Ting Hsieh, Chao Huang, Chung-Wei Lin, Eunsuk Kang, and Qi Zhu. Efficient system verification with multiple weakly-hard constraints for runtime monitoring. In Proceedings of the International Conference on Runtime Verification (RV), October 2020.
[bib | pdf ] - Ching-Yuan Bai, Buo-Fu Chen, and Hsuan-Tien Lin. Benchmarking tropical cyclone rapid intensification with satellite images and attention-based deep models. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), September 2020.
[bib | data | arxiv | code ]