This talk offers a deep dive into data privacy in language model (LM) applications, spotlighting the use of opaque prompts as a key strategy for safeguarding sensitive information. We explore how opaque prompts effectively sanitize user inputs by substituting sensitive data with non-identifiable placeholders, thereby preventing LMs from accessing personally identifiable information (PII). The discussion extends to the intricacies of implementing these prompts, highlighting the technical challenges in reliably masking PII and the need for customizable identification mechanisms. The talk also addresses the privacy concerns in LM training data, focusing on the challenges in anonymizing datasets and the implications for model accuracy and utility. This session aims to provide insights into advancing data protection methodologies within the realm of language models.
Zairah Mustahsan
Zairah Mustahsan is a Staff Data Scientist at You.com, an AI chatbot for search, where she leverages her expertise in statistical and machine-learning techniques to build analytics and experimentation platforms. Previously, Zairah was a Data Scientist at IBM Research, researching Natural Language Processing (NLP) and AI Fairness topics. Zairah obtained her M.S. in Computer Science from the University of Pennsylvania, where she researched scikit-learn model performance. Her findings have since been used as guidelines for machine learning. Zairah is a regular speaker at AI conferences such as NeurIPS, AI4, AI Hardware & Edge AI Summit, and ODSC. Zairah has published her work in top AI conferences such AAAI and has over 300 citations. Aside from work, Zairah enjoys adventure sports and poetry.