Research | Parameter Lab

Auditing Foundation Models for Safe and Compliant Deployment

Scientific Research in Transparency, Reliability, and Risk Assessment of Black-Box Foundation Model Systems

At Parameter Lab, we are experts in auditing black-box foundation models—such as large language models (LLMs)—to enable safe and compliant deployment. Our scientific research addresses key challenges including privacy risks, agent behavior, uncertainty estimation, model fingerprinting, membership inference, and benchmarking.

Our Research

We follow a set of core values in our research: openness, clarity, and real-world impact. We support open research by publishing our papers, code, and data. We focus on simple, transparent methods that are easy to apply. And we aim for practical solutions that address real challenges in the deployment of foundation models. Below are our contributions in this area.

SEO

Benchmark

RAG

conversational search engines

LLM

C-SEO Bench: Does Conversational SEO Work?

Haritz PuertoMartin GubriTommaso GreenSeong Joon OhSangdoo Yun

We introduce C-SEO Bench, the first benchmark to evaluate conversational search engine optimization (C-SEO) methods across tasks, domains, and multiple adopting actors. We reveal that C-SEO is mostly ineffective, contrary to traditional SEO.

Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models

Haritz PuertoMartin GubriSangdoo YunSeong Joon Oh

NAACL 2025 (findings)

This paper investigates membership inference attacks (MIA), which aim to determine whether specific data, such as copyrighted text, was included in the training of large language models. By examining a continuum from single sentences to large document collections, we address a gap in understanding when MIA methods begin to succeed, shedding light on their potential to detect misuse of copyrighted or private materials in training data.

Confidence

Uncertainty

LLM

Calibrating Large Language Models Using Their Generations Only

Dennis UlmerMartin GubriHwaran LeeSangdoo YunSeong Joon Oh

ACL 2024

As large language models (LLMs) are integrated into user applications, accurately measuring a model's confidence in its predictions is crucial for trust and safety. We introduce APRICOT, a method that trains a separate model to predict an LLM's confidence using only its text input and output. This method is simple, does not require direct access to the LLM, and preserves the original language generation process.

Fingerprinting

Compliance

LLM

TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification

Martin GubriDennis UlmerHwaran LeeSangdoo YunSeong Joon Oh

ACL 2024 (findings)

Large Language Models (LLMs) come with usage rules to protect interests and prevent misuse. This study introduces Black-box Identity Verification (BBIV), aiming to identify if a service uses a specific LLM via chat for compliance. The method, Targeted Random Adversarial Prompt (TRAP), uses adversarial suffixes to get a pre-defined answer from the specific LLM, while other models give random answers. TRAP offers a novel approach for ensuring compliance with LLM usage policies.

AI Security

Personally Identifiable Information (PII) Detection

Explore the challenges and ethical concerns of using unauthorised personal data in training Large Language Models (LLMs). Understand the significance of Personally Identifiable Information (PII) security and discover ProPILE, a tool designed to assess the recoverability of your own PII in popular LLMs.

Privacy

LLM

ProPILE: Probing Privacy Leakage in Large Language Models

Siwon KimSangdoo YunHwaran LeeMartin GubriSungroh Yoon*Seong Joon Oh*

NeurIPS 2023 (spotlight)

Large language models (LLMs) absorb web data, which may include sensitive personal information. Our tool, ProPILE, acts as a detective, helping individuals assess potential personal data exposure within LLMs. It allows users to tailor prompts to monitor their data, fostering awareness and control over personal information in the age of LLMs.