Content Validity Index (CVI) in Survey Development: A Step-by-Step Guide for Biomedical Researchers

Adrian Campbell Nov 29, 2025 291

This article provides a comprehensive guide for researchers and drug development professionals on the development and validation of surveys using the Content Validity Index (CVI).

Content Validity Index (CVI) in Survey Development: A Step-by-Step Guide for Biomedical Researchers

Abstract

This article provides a comprehensive guide for researchers and drug development professionals on the development and validation of surveys using the Content Validity Index (CVI). It covers foundational concepts of content validity, detailed methodological steps for CVI calculation and implementation, strategies for troubleshooting common issues, and advanced techniques for instrument validation. By integrating current methodologies, practical Excel tutorials, and examples from recent clinical and biomedical research, this guide serves as an essential resource for ensuring the rigor and credibility of data collection instruments in scientific studies.

Understanding Content Validity: The Cornerstone of Robust Survey Design

Defining Content Validity and its Critical Role in Research

Content validity is a fundamental psychometric property that assesses the extent to which an instrument's items accurately and fully represent the entire domain of the construct being measured [1]. It answers a critical question: Does the content of this measurement tool adequately sample all facets of the concept we intend to measure? Unlike face validity, which merely concerns superficial appearance, content validity requires systematic, rigorous evaluation of test content by subject matter experts to ensure no important components are missing and that all items are relevant and appropriate for the intended purpose [1].

In research contexts—particularly in healthcare, education, and social sciences—content validity serves as the bedrock for drawing valid inferences from data. Without adequate content validity, statistical significance based on test scores may be inaccurate or misleading, compromising research integrity and practical applications such as clinical assessments, educational testing, and personnel selection [1]. For drug development professionals and researchers, establishing content validity is essential for justifying the use of any instrument for a specific measurement purpose, ensuring that the tools accurately capture the intended constructs, whether measuring patient-reported outcomes, symptom severity, or treatment effectiveness.

Theoretical Framework and Definitions

Core Components of Content Validity

Content validity primarily evaluates two key aspects of an instrument: relevance and representativeness [1]. Relevance ensures each item appropriately reflects the target construct, while representativeness guarantees the items comprehensively cover the entire conceptual domain. This dual focus distinguishes content validity from other validity forms and establishes its critical role in instrument development.

Content Validity vs. Construct Validity

While both are essential measurement properties, content validity and construct validity serve distinct purposes [1]:

Content Validity focuses specifically on the instrument's content, assessing how well the items sample the domain of interest. It is typically evaluated deductively by defining the construct and systematically selecting items from that domain through expert judgment.

Construct Validity is a broader concept that encompasses content validity as one aspect. It investigates whether the instrument truly measures the theoretical construct it claims to measure by examining the relationship between test scores and other variables, internal structure, and responses to interventions [1].

The relationship between these validities is hierarchical: content validity is a necessary but insufficient condition for construct validity. An instrument can have good content validity yet lack construct validity if it doesn't truly measure the intended psychological construct [1].

Table 1: Key Differences Between Content Validity and Construct Validity

Feature	Content Validity	Construct Validity
Definition	Extent to which items reflect the specific concept being measured	Extent to which a test measures the underlying theoretical construct
Scope	Narrower; focuses on items and their relationship to content domain	Broader; encompasses content validity and other validity evidence
Focus	Relevance and representativeness of items	Meaning of test scores in relation to theoretical framework
Evaluation Methods	Expert review, item-domain congruence	Factor analysis, relationships with other variables, experimental interventions

Quantitative Assessment of Content Validity

Content Validity Index (CVI) Methodology

The Content Validity Index (CVI) is a widely accepted quantitative method for assessing content validity, particularly in healthcare and social science research [2] [3]. The CVI is calculated at two levels: Item-Level CVI (I-CVI) for individual items and Scale-Level CVI (S-CVI) for the entire instrument.

The assessment process typically involves a panel of 3-10 subject matter experts who evaluate each item using a 4-point Likert scale for relevance: 1 = "not relevant," 2 = "somewhat relevant," 3 = "quite relevant," and 4 = "highly relevant" [2]. These ratings are then converted to binary values (0 or 1) for calculation, with ratings of 3 or 4 considered "valid" (coded as 1) and ratings of 1 or 2 considered "not valid" (coded as 0) [2].

Calculation Protocols

I-CVI Calculation: The I-CVI is computed for each item as the proportion of experts giving a rating of 3 or 4 [2]. For example, if 5 out of 6 experts rate an item as 3 or 4, the I-CVI would be 5/6 = 0.83.

S-CVI Calculation: The Scale-Level Content Validity Index can be computed using two approaches [2]:

S-CVI/Ave: The average of all I-CVIs across the instrument
S-CVI/UA: The proportion of items that achieve a rating of 3 or 4 by all experts

Table 2: Content Validity Index Thresholds and Standards

Number of Experts	Minimum Acceptable I-CVI	Source
2	0.80	Davis (1992)
3-5	1.00	Polit & Beck (2006), Polit et al. (2007)
6-8	0.83	Lynn (1986)
≥9	0.78	Lynn (1986)

For newly developed instruments, a CVI value of ≥0.80 is generally required to confirm that items possess high, clear, and relevant content validity [2]. The S-CVI/Ave should ideally exceed 0.90 for the overall instrument to be considered to have excellent content validity [3].

Experimental Protocols for Content Validation

Instrument Design and Development Phase

The content validation process begins with comprehensive instrument design through a three-step process [3]:

Domain Determination: Clearly define the content domain of the construct through literature review, interviews with target respondents, and focus groups. This step requires precise definition of the construct's attributes, characteristics, boundaries, dimensions, and components.
Item Generation: Develop items that comprehensively sample the identified content domain. Qualitative research methods, including interviews with individuals familiar with the concept, are invaluable for generating instrument items that enrich and expand upon existing literature.
Instrument Construction: Refine and organize items into a suitable format and sequence, collecting finalized items into a usable measurement instrument.

Expert Judgment Phase

This phase involves quantitative and qualitative evaluation by an expert panel [3]:

Expert Panel Selection: Assemble 5-10 content experts with professional or research experience in the field. Including lay experts (potential research subjects) ensures the instrument represents the target population.
Evaluation Process: Experts quantitatively rate each item for relevance, clarity, and comprehensiveness using standardized scales. They also provide qualitative feedback on grammar, wording, sequencing, and scoring.
Data Analysis: Calculate quantitative indices (CVR, I-CVI, S-CVI) and analyze qualitative feedback to refine and improve the instrument.

Diagram 1: Content Validity Assessment Workflow

Application in Research Settings

Cross-Cultural Translation and Validation

Content validity methodology is particularly crucial in cross-cultural research and instrument translation. A 2025 study demonstrating the translation of a patient-reported outcome measure for head and neck cancer highlights this application [4]. Researchers used CVI methodology where an expert panel of nurses proficient in both Spanish and English independently reviewed and rated a forward translation for cultural relevance and translation equivalence. The study achieved excellent content validity with average CVI scores of 0.95 for cultural relevance and 0.84 for translation equivalence, with problematic items (CVI <0.59) refined through cognitive interviews with native Spanish-speaking patients [4].

Healthcare and Clinical Research Applications

In healthcare research, content validity is essential for developing patient-centered instruments. A methodological study examining patient-centered communication in oncology wards identified seven dimensions through content validation: trust building, informational support, emotional support, problem solving, patient activation, intimacy/friendship, and spirituality strengthening [3]. From an initial set of 188 items, content validity process refined the instrument to appropriate items across these domains, achieving an S-CVI/Ave of 0.93, demonstrating excellent content validity [3].

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Reagents for Content Validity Research

Research Reagent	Function/Purpose	Application Notes
Expert Panel	Provides subject matter expertise for item evaluation	Select 5-10 experts with minimum 5 years field experience; include both content experts and lay experts from target population
4-Point Likert Scale	Standardized rating system for item relevance	1=Not relevant, 2=Somewhat relevant, 3=Quite relevant, 4=Highly relevant; prevents neutral responses
CVI Calculation Framework	Quantitative assessment of content validity	Computes I-CVI (item-level) and S-CVI (scale-level); requires binary conversion of ratings (1,2=0; 3,4=1)
Statistical Software (Excel)	Automated CVI calculation	Uses COUNTIF, AVERAGE functions; reduces human error in manual calculations
Cognitive Interview Protocol	Qualitative refinement of problematic items	Identifies issues with wording, comprehension; used for items with CVI <0.59
Content Validity Ratio (CVR)	Assesses essentiality of items	CVR = (Nₑ - N/2)/(N/2); Nₑ=number indicating "essential", N=total experts

Advanced Methodological Considerations

Modified Kappa Statistic

Beyond CVI, the modified kappa statistic accounts for chance agreement among experts [3]. This approach calculates the probability of chance agreement and provides a more robust statistical evaluation of content validity. Kappa values >0.74 are considered excellent, while values between 0.60-0.74 are considered good.

Universal Agreement vs. Average Agreement

The S-CVI/UA (universal agreement) tends to be more conservative than S-CVI/Ave (average agreement) because it requires all experts to agree on each item [2] [3]. With larger expert panels, S-CVI/UA typically yields lower values, making S-CVI/Ave more practical while still maintaining rigorous standards.

Diagram 2: CVI Calculation Methodology

Content validity remains a foundational requirement for developing psychometrically sound research instruments. Through systematic application of CVI methodology—including expert panel evaluation, quantitative assessment of I-CVI and S-CVI, and rigorous adherence to established thresholds—researchers can ensure their measurement tools adequately represent the constructs under investigation. For drug development professionals and scientific researchers, robust content validation provides the necessary foundation for collecting meaningful, reliable data that can accurately inform clinical decisions, treatment development, and scientific understanding. The protocols and applications outlined in this article provide a comprehensive framework for implementing content validity assessment across diverse research contexts.

In research instrument development, content validity is a fundamental concept that assesses whether the items in a questionnaire or scale adequately represent the entire domain of the construct being measured [3]. Also known as definition validity and logical validity, it answers the question of to what extent the selected sample of items comprehensively covers the content area [3]. Establishing content validity is a critical prerequisite for other forms of validity and must receive the highest priority during instrument development, as an instrument lacking content validity cannot establish reliability [3].

The Content Validity Index (CVI) has emerged as a widely accepted quantitative method for evaluating content validity, particularly in educational, social science, and healthcare research [2]. This metric systematically quantifies expert agreement on item relevance, providing researchers with evidence that their assessment content fairly and adequately represents a defined domain of knowledge or performance [5] [2]. For thesis research in survey development, establishing content validity through CVI represents an essential methodological step that strengthens theoretical foundations and enhances overall data quality.

Understanding I-CVI and S-CVI

The Content Validity Index operates at two distinct but complementary levels: the item level and the scale level.

Item-Level Content Validity Index (I-CVI)

The Item-Level Content Validity Index (I-CVI) represents the proportion of content experts who agree that a specific item is relevant to the construct being measured [2]. Calculated for each individual item in an instrument, I-CVI provides granular information about which items may require revision or elimination. This item-level analysis allows researchers to identify problematic items while retaining well-performing ones, thus enabling targeted instrument refinement.

Scale-Level Content Validity Index (S-CVI)

The Scale-Level Content Validity Index (S-CVI) evaluates the overall validity of the entire questionnaire or scale [2]. It can be calculated using two different approaches:

S-CVI/Universal Agreement (S-CVI/UA): The proportion of items that achieve a perfect relevance rating from all experts [2]. This method represents the most stringent approach to establishing content validity.
S-CVI/Average (S-CVI/Ave): The average of all I-CVI values for items in the scale [2]. This approach provides a more comprehensive overview of content validity across the entire instrument.

Research has demonstrated that these two calculation methods can yield different results. One study found that while the S-CVI/UA was low, the S-CVI/Ave was 0.93, indicating good overall content validity despite the difficulty in achieving universal expert consensus on all items [3].

Quantitative Standards and Interpretation

Establishing clear thresholds for acceptable CVI values is essential for rigorous instrument development. The following table summarizes the widely accepted standards based on the number of experts involved in content validation:

Table 1: Content Validity Index Thresholds Based on Number of Experts

Number of Experts	Acceptable I-CVI Values	Source of Recommendation
2	At least 0.8	Davis (1992)
3 to 5	Should be 1	Polit & Beck (2006), Polit et al., (2007)
At least 6	At least 0.83	Polit & Beck (2006), Polit et al., (2007)
6 to 8	At least 0.83	Lynn (1986)
At least 9	At least 0.78	Lynn (1986)

For newly developed instruments, current research recommends a CVI value of ≥ 0.8 to confirm that items possess high, clear, and relevant content validity [2]. The S-CVI/Ave should also meet or exceed this 0.8 threshold for the overall instrument to be considered valid [2].

Experimental Protocol for CVI Assessment

Implementing a systematic content validity study involves a structured two-step process: instrument development and expert judgment [3].

Stage 1: Instrument Design

The initial design phase consists of three critical steps:

Domain Determination: Precisely define the content area related to the variables being measured through comprehensive literature review, interviews with respondents, and focus groups [3]. This step establishes clear boundaries, dimensions, and components of the construct.
Item Generation: Develop individual items that adequately sample from the defined content domain [3]. Qualitative research methods can be particularly valuable for generating items that reflect the lived experience of the target population.
Instrument Formation: Refine and organize items into a suitable format and sequence, creating a usable instrument [3]. This includes attention to grammar, wording, and scoring procedures.

Stage 2: Expert Judgment

The judgment phase involves quantitative evaluation by content experts:

Expert Panel Selection: Convene a panel of 3-10 experts with substantial experience (typically ≥5 years) in the relevant field [2] [3]. Including both content experts and potential respondents ensures professional judgment and population representation [3].
Relevance Rating: Experts independently rate each item using a four-point Likert scale: 1 = "Not relevant," 2 = "Somewhat relevant," 3 = "Quite relevant," and 4 = "Highly relevant" [2].
Data Collection and Analysis: Compile expert ratings and calculate CVI values using standardized computational procedures [2].

The following workflow diagram illustrates the complete CVI assessment process:

Computational Methodology

Calculating CVI values involves a systematic transformation of expert ratings into quantitative validity indices. Microsoft Excel provides an accessible platform for performing these calculations efficiently.

Data Preparation

Create a structured data table with questionnaire items as rows and expert ratings as columns [2]. Record all ratings using the four-point Likert scale (1 = "Not relevant" to 4 = "Highly relevant").

Binary Conversion of Relevance Ratings

Convert the Likert scale ratings to binary values (0 or 1) to facilitate CVI calculation [2]:

Table 2: Binary Conversion Scheme for Expert Ratings

Likert Scale Rating	Binary Conversion	Interpretation
1, 2	0	Not Valid
3, 4	1	Valid

In Excel, use the formula =IF(B2>=3,1,0) where B2 contains the expert's rating [2]. Apply this formula across all expert columns and item rows.

I-CVI Calculation

Compute the Item-Level CVI using the following procedure:

Count Expert Agreement: For each item, count the number of experts who rated it as valid (binary value = 1) using the formula =COUNTIF(H2:J2,">=1") where H2:J2 contains the binary values for the three experts [2].
Calculate I-CVI: Divide the number of agreeing experts by the total number of experts using the formula =K2/3 where K2 contains the agreement count [2].
Categorize Items: Classify items as valid or invalid based on established thresholds using the formula =IF(L2>=0.8,"Valid","Invalid") where L2 contains the I-CVI value [2].

S-CVI Calculation

Compute Scale-Level CVI using two approaches:

S-CVI/Average: Calculate the average of all I-CVI values using Excel's AVERAGE function [2].
S-CVI/Universal Agreement:
- First, identify items where all experts agree (binary value = 1 for all experts) using the formula =IF(AND(H2=1,I2=1,J2=1),1,0) [2].
- Then, calculate the proportion of such items out of the total number of items.

The following diagram illustrates the computational workflow for CVI calculation:

The Researcher's Toolkit: Essential Reagents for CVI Studies

Table 3: Essential Methodological Components for Content Validity Studies

Research Component	Function	Implementation Example
Expert Panel	Provide professional judgment on item relevance	3-10 content experts with ≥5 years field experience [2] [3]
Four-Point Likert Scale	Quantify expert ratings of item relevance	1="Not relevant" to 4="Highly relevant" [2]
Binary Conversion Protocol	Transform ordinal ratings into dichotomous validity judgments	Ratings 3-4 → 1 (Valid); Ratings 1-2 → 0 (Not Valid) [2]
CVI Calculation Framework	Compute item-level and scale-level validity indices	I-CVI, S-CVI/UA, S-CVI/Ave [2]
Threshold Standards	Establish minimum acceptability criteria for validity indices	I-CVI ≥ 0.8; S-CVI/Ave ≥ 0.8 [2]
Statistical Software	Automate computation and reduce human error	Microsoft Excel with COUNTIF, AVERAGE functions [2]

Application in Drug Development Research

For drug development professionals, establishing content validity is particularly crucial when developing patient-reported outcome (PRO) measures, clinician-rated scales, and other instruments used in clinical trials. The CVI framework provides methodological rigor that aligns with regulatory requirements for instrument validation in pharmaceutical research.

In healthcare contexts, content validity ensures that training tools, pre/post-test questionnaires, and research instruments align with evidence-based practices [2]. The patient-centered communication instrument study demonstrates how CVI methodology can be applied to develop culturally appropriate measures for specific patient populations, such as cancer patients in oncology wards [3].

When adapting instruments for cross-cultural use or specific disease populations, content validity assessment becomes an essential first step that informs subsequent translation, cultural adaptation, and psychometric validation processes. The quantitative nature of CVI provides compelling evidence for regulatory submissions regarding the content validity of clinical outcome assessments.

The Content Validity Index, with its item-level (I-CVI) and scale-level (S-CVI) components, provides researchers with a systematic, quantitative method for establishing the content validity of research instruments. Through rigorous expert evaluation and standardized computational procedures, the CVI framework enables drug development professionals, researchers, and scientists to ensure their survey instruments adequately represent the constructs they intend to measure.

By implementing the protocols and methodologies outlined in this article, thesis researchers can strengthen the methodological foundation of their survey development research, producing instruments that demonstrate both quantitative rigor and qualitative relevance to their target domains.

Application Notes

The Role of the Expert Panel in Content Validity

In Content Validity Index (CVI) survey development research, expert panels serve as the definitive reference standard for assessing how well instrument items represent the construct being measured [6]. The panel's primary function is to independently evaluate item relevance, clarity, and comprehensiveness based on a predefined conceptual framework [7]. This judgment-based process ensures that a data collection instrument accurately captures all facets of the concept under investigation, whether it pertains to professional quality of life, drug toxicity, spiritual health, pain, or other phenomena [6]. The methodology is particularly valuable in cross-cultural translation of patient-reported outcome measures, where it systematically identifies problematic items requiring refinement by evaluating both cultural relevance and translation equivalence [8].

Consequences of Inadequate Panel Sizing

Determining the appropriate expert panel size involves balancing methodological rigor with practical constraints. Evidence from diagnostic research indicates substantial heterogeneity in panel composition, with underpowered panels risking unreliable consensus and oversized panels creating administrative burdens without commensurate benefits [9]. In pharmaceutical and healthcare research, insufficient panel sizes may compromise content validity assessments, leading to measurement instruments that fail to detect clinically significant differences in patient outcomes or drug effects. Conversely, properly constituted panels enhance instrument reliability, support regulatory approval processes, and strengthen evidence for drug efficacy claims through psychometrically sound measurement.

Experimental Protocols

Protocol 1: Determining Optimal Panel Size and Composition

Purpose and Principle

This protocol establishes standardized procedures for constituting expert panels with optimal size and composition to ensure robust content validity assessments in pharmaceutical research and survey development.

Materials and Reagents

Table 1: Research Reagent Solutions for Expert Panel Formation

Item	Function	Specifications
Expert Recruitment Database	Identifies potential panelists with relevant expertise	Minimum 5 years domain-specific experience; peer-reviewed publications or clinical specialization
Qualification Matrix	Standardizes expert selection	Evaluates expertise diversity, methodological experience, and conflict of interest status
Four-Point Likert Scale	Quantifies expert ratings	1=Not relevant; 2=Somewhat relevant; 3=Quite relevant; 4=Highly relevant [2]
Content Validity Index (CVI) Calculator	Computes validity metrics	Microsoft Excel with COUNTIF and AVERAGE functions for I-CVI and S-CVI calculations [2]

Procedure

Define Expertise Requirements: Identify the specific domains of knowledge needed based on the instrument's target construct (e.g., clinical pharmacology, oncology, psychometrics, cross-cultural adaptation).
Establish Panel Size Parameters:
- For most applications, constitute panels of 3-5 experts for focused instrument development [6].
- For high-stakes validation (e.g., regulatory submission), expand to 5-8 experts to enhance reliability [9].
- For complex constructs requiring multiple specialties, consider 8-12 experts with staggered participation through modified Delphi techniques [7].
Recruit and Screen Experts:
- Select experts based on:
  - Minimum 5 years of domain-specific experience
  - Peer-reviewed publications or clinical specialization in relevant field
  - Diversity of perspectives (methodology, clinical practice, cultural background)
  - Absence of significant conflicts of interest
Implement Independent Rating:
- Provide experts with the conceptual definition of the construct
- Distribute instrument items with standardized rating scales
- Ensure independent assessment without conferring to prevent groupthink
Calculate Content Validity Metrics:
- Compute Item-Level CVI (I-CVI) for each instrument item
- Calculate Scale-Level CVI (S-CVI) for the overall instrument
- Apply predetermined thresholds (typically I-CVI ≥0.78, S-CVI ≥0.90) for acceptable validity [6]

Protocol 2: Content Validity Assessment Using CVI Methodology

Purpose and Principle

This protocol provides a standardized approach for quantifying content validity through systematic expert evaluation, enabling researchers to refine measurement instruments for drug development research.

Procedure

Expert Panel Briefing:
- Conduct orientation session to review construct definitions and rating criteria
- Distribute rating materials with clear instructions
- Establish timeline for completion (typically 2-3 weeks)
Item Evaluation:
- Experts rate each item using four-point Likert scale for relevance:
  - 1 = Not relevant
  - 2 = Somewhat relevant
  - 3 = Quite relevant
  - 4 = Highly relevant [2]
- Additional ratings may be collected for clarity, comprehensiveness, and cultural appropriateness
Data Collection and Processing:
- Compile expert ratings in structured format (e.g., Excel spreadsheet)
- Convert Likert ratings to binary values (3-4 = "valid", 1-2 = "not valid")
- Calculate I-CVI as proportion of experts rating item 3 or 4
- Compute S-CVI/Average as mean of I-CVIs across all items
Item Refinement and Iteration:
- Flag items with I-CVI below acceptable threshold (typically <0.78)
- Conduct cognitive interviews to understand rating discrepancies
- Revise problematic items based on qualitative feedback
- Re-submit revised items for re-evaluation if necessary

Data Presentation and Analysis

Expert Panel Configuration Parameters

Table 2: Optimal Panel Size Recommendations for Different Research Contexts

Research Context	Recommended Panel Size	Key Considerations	Supporting Evidence
Diagnostic Accuracy Studies	3-4 experts (median)	Most common configuration in medical research; balances reliability and feasibility	Analysis of 318 studies showing 55% used 3-4 experts [9]
Drug Development and Regulatory Submissions	5-8 experts	Enhanced rigor for regulatory review; multidisciplinary representation	Methodological guidance for high-stakes validation [9]
Cross-Cultural Translation and Adaptation	6+ experts	Requires language proficiency and cultural expertise alongside clinical knowledge	CVI methodology for PROM translation [8]
General Survey Development	3-5 experts	Cost-effective for most research instruments with acceptable validity	Established CVI methodology guidelines [6]

Content Validity Index Calculation and Interpretation

Table 3: CVI Thresholds and Decision Rules for Instrument Development

Metric	Calculation Method	Acceptability Threshold	Interpretation	Implementation Example
Item-Level CVI (I-CVI)	Proportion of experts rating item 3 or 4 on relevance scale	≥0.78 for panels of 3-5 experts	Item has acceptable content validity	Item with 4/5 experts rating 3-4: I-CVI=0.80 [2]
Scale-Level CVI (S-CVI/Ave)	Average of I-CVIs across all scale items	≥0.90	Overall instrument has excellent content validity	Mean of 20 items with average I-CVI=0.92 [8]
Scale-Level CVI (S-CVI/UA)	Proportion of items rated 3-4 by all experts	≥0.80	High degree of universal expert agreement	16/20 items universally endorsed: S-CVI/UA=0.80 [2]

Technical Specifications

Statistical Analysis and Reporting Requirements

Implementation Guidelines for Pharmaceutical Research

For drug development applications, expert panels should include:

Clinical subject matter experts with direct patient experience in the therapeutic area
Psychometric specialists skilled in measurement theory and instrument design
Regulatory science experts familiar with FDA, EMA, or other relevant guidelines
Patient representation when developing patient-reported outcome measures
Cross-cultural experts for global clinical trials requiring instrument translation

Panel management should incorporate:

Blinded independent ratings to prevent dominance by influential members
Structured feedback mechanisms for item refinement
Documentation of all decisions for regulatory audit trails
Iterative review cycles until validity thresholds are achieved

The rigorous application of these expert panel protocols ensures that content validity assessments in pharmaceutical research meet the highest methodological standards, producing measurement instruments capable of generating reliable, regulatory-grade evidence for drug development programs.

Establishing Acceptable CVI Thresholds for Scientific Rigor

Content validity is a critical component in the development of research instruments, ensuring that items adequately represent the construct domain being measured [2]. The Content Validity Index (CVI) has emerged as a widely accepted quantitative method for assessing this validity, particularly in healthcare, social sciences, and educational research [2] [10]. The CVI methodology systematically quantifies expert agreement on item relevance, providing researchers with a rigorous approach to instrument validation [11]. This application note establishes evidence-based thresholds for CVI interpretation and provides standardized protocols for implementation within pharmaceutical and clinical research settings, where measurement precision directly impacts drug development outcomes and patient safety.

The CVI framework operates at two distinct levels: the Item-Level CVI (I-CVI), which assesses individual items, and the Scale-Level CVI (S-CVI), which evaluates the entire instrument [2] [10]. Proper application of CVI methodology requires understanding both statistical thresholds and methodological best practices, from expert panel selection to statistical calculation [12]. The following sections provide comprehensive guidance on establishing psychometrically sound CVI standards for research instrument development.

Quantitative Standards for CVI Interpretation

Established Thresholds for Content Validity

Based on methodological research and widespread application across studies, specific quantitative thresholds have been established for interpreting CVI values. These standards ensure instruments meet minimum psychometric requirements for content validity.

Table 1: Established CVI Thresholds for Instrument Validation

Index Type	Acceptable Threshold	Ideal Threshold	Key Considerations
I-CVI (Item-Level)	≥ 0.78 [13] [12]	1.0 for 3-5 experts [2]	Threshold varies based on number of experts
S-CVI/Ave (Scale-Level Average)	≥ 0.80 [10] [14]	≥ 0.90 [10]	Most commonly reported S-CVI method
S-CVI/UA (Scale-Level Universal Agreement)	≥ 0.80 [2]	Not commonly used	Very stringent; all experts must agree on all items

The I-CVI threshold of ≥0.78 is particularly important as it represents excellent agreement after adjusting for chance [12]. For studies with smaller expert panels (3-5 experts), some methodologies recommend a more stringent I-CVI of 1.0, as there is less opportunity for disagreement without compromising validity [2].

CVI Thresholds Based on Expert Panel Size

The acceptable CVI values are influenced by the number of content experts participating in validation. Smaller panels require higher agreement levels to achieve statistical significance.

Table 2: CVI Thresholds Based on Number of Experts

Number of Experts	Acceptable I-CVI Value	Source Recommendation
2	At least 0.8	Davis (1992) [2]
3 to 5	Should be 1	Polit & Beck (2006), Polit et al., (2007) [2]
6 to 8	At least 0.83	Lynn (1986) [2]
At least 9	At least 0.78	Lynn (1986) [2]

These thresholds account for the probability of chance agreement, with larger panels allowing for slightly lower agreement rates while maintaining methodological rigor [2] [12]. For pharmaceutical and clinical research applications, where instruments may inform regulatory decisions or clinical trials, adhering to the more conservative thresholds is recommended.

Experimental Protocols for CVI Assessment

Protocol 1: Comprehensive CVI Calculation Workflow

This protocol provides a step-by-step methodology for calculating CVI values, adaptable for various research contexts including clinical outcome assessments and patient-reported outcome measures.

Materials and Reagents:

Expert panel (5+ recommended for clinical trials research)
4-point Likert scale rating form (1 = not relevant; 2 = somewhat relevant; 3 = quite relevant; 4 = highly relevant)
Data collection platform (Microsoft Excel, SPSS, or specialized validation software)
CVI calculation template

Procedure:

Expert Panel Recruitment: Select 3-10 content experts with demonstrated expertise in the target domain [10] [11]. For drug development applications, include clinical specialists, methodological experts, and patient representatives where appropriate.
Item Rating: Distribute the instrument with instructions for experts to rate each item using the 4-point relevance scale [2].
Data Compilation: Compile expert ratings in a structured format with items as rows and experts as columns.
Dichotomization: Convert Likert scale ratings to binary values (0 or 1), where ratings of 3 or 4 become "1" (valid), and ratings of 1 or 2 become "0" (not valid) [2].
I-CVI Calculation: For each item, calculate I-CVI as the number of experts rating it 3 or 4 divided by the total number of experts [2].
S-CVI Calculation:
- Calculate S-CVI/Ave as the average of all I-CVI values [2] [10]
- Calculate S-CVI/UA as the proportion of items receiving ratings of 3 or 4 from all experts [2]
Threshold Application: Compare calculated CVIs to established thresholds (Table 1) and make item retention decisions [2].

Troubleshooting:

If I-CVI values fall below thresholds, review qualitative expert feedback, revise problematic items, and consider re-evaluation
If expert disagreement is high, assess whether panel composition appropriately represents the content domain
If S-CVI values are inadequate, review instrument framework and theoretical foundation

Protocol 2: Excel-Based CVI Calculation

For researchers without specialized statistical software, Microsoft Excel provides a accessible platform for CVI calculation [2] [15].

Materials and Reagents:

Microsoft Excel (2016 or later)
Expert rating data
CVI calculation template

Procedure:

Data Setup: Create a table with questionnaire items as rows and expert ratings as columns [2]
Binary Conversion:
- Select target cell for first binary conversion
- Enter formula: =IF(original_cell>=3,1,0)
- Drag formula to complete binary matrix for all experts and items [2]
Expert Agreement Count:
- Select cell for first agreement count
- Enter formula: =COUNTIF(binary_range,">=1")
- Drag formula to calculate for all items [2]
I-CVI Calculation:
- Select cell for first I-CVI value
- Enter formula: =agreement_cell/total_experts
- Drag formula to calculate for all items [2]
Item Categorization:
- Select cell for validity categorization
- Enter formula: =IF(I-CVI_cell>=0.8,"Valid","Invalid")
- Drag formula to categorize all items [2]
S-CVI Calculation:
- Calculate S-CVI/Ave: =AVERAGE(I-CVI_range)
- Calculate proportion of items with I-CVI meeting threshold [2]

Validation:

Verify formula accuracy with test data
Cross-validate manual calculation for subset of items
Ensure consistent formatting throughout the worksheet

Research Reagent Solutions for CVI Studies

Table 3: Essential Methodological Components for CVI Assessment

Research Component	Function in CVI Assessment	Implementation Examples
Expert Panel	Provides content domain expertise	Clinical specialists, methodological experts, patient representatives [10] [11]
Rating Scale	Standardizes relevance assessment	4-point Likert scale (1=not relevant to 4=highly relevant) [2] [10]
Calculation Framework	Quantifies expert agreement	I-CVI, S-CVI/Ave, S-CVI/UA formulas [2] [12]
Statistical Software	Automates CVI computation	Microsoft Excel, SPSS, R, specialized validation packages [2] [15]
Decision Rules	Guides item retention/revision	Threshold application (I-CVI ≥ 0.78, S-CVI/Ave ≥ 0.80) [2] [12]

Application in Pharmaceutical and Clinical Research

In drug development contexts, establishing content validity is particularly crucial for clinical outcome assessments (COAs), patient-reported outcomes (PROs), and other measurement instruments used in clinical trials [10] [16]. Regulatory agencies increasingly require demonstrated content validity for instruments supporting labeling claims.

Recent applications in healthcare research demonstrate the utility of rigorous CVI assessment. In nursing research, the Nursing Process Evaluation Tool (NPET) achieved an S-CVI/Ave of 0.88, confirming its validity for assessing AI-generated nursing care plans [10]. Similarly, a medication self-management instrument for older adults maintained adequate content validity, with 83% of items achieving CVI scores above 0.80 for relevance [11].

The CVI methodology has also been successfully applied in specialized clinical contexts. A drug clinical trial participation feelings questionnaire for cancer patients underwent rigorous content validation during development [16], while a shave biopsy training checklist demonstrated a content validity index of 0.76, surpassing the required threshold of 0.62 [17].

Establishing acceptable CVI thresholds is fundamental to scientific rigor in instrument development. The standards outlined in this document—I-CVI ≥ 0.78 and S-CVI/Ave ≥ 0.80—provide empirically supported benchmarks for content validity assessment across research contexts. The protocols and methodologies presented enable consistent application of these standards, particularly in pharmaceutical and clinical research where measurement precision directly impacts development decisions and patient outcomes. As measurement science evolves, these CVI thresholds and methodologies provide a foundation for developing valid, reliable instruments that generate trustworthy scientific evidence.

A Practical Framework for CVI Calculation and Implementation

Content Validity Index (CVI) is a critical psychometric metric used to quantify the degree to which an instrument, such as a questionnaire or survey, adequately measures the construct it intends to assess. Establishing content validity is a fundamental step in instrument development, particularly in pharmaceutical and clinical research where measurement accuracy directly impacts study outcomes and decision-making. The CVI methodology systematically incorporates expert judgment to evaluate item relevance and representativeness, providing quantitative evidence that the instrument's content reflects the targeted domain [18].

This protocol details a standardized approach for calculating CVI, specifically focusing on the transformation of expert Likert scale ratings into binary scores and the subsequent computation of item-level and scale-level validity indices. This process is essential for researchers developing instruments to measure complex constructs in drug development, clinical outcomes assessment, and healthcare research where valid measurement tools are prerequisites for generating reliable data [3].

Theoretical Framework and Key Concepts

Defining Content Validity

Content validity provides evidence about the degree to which elements of an assessment instrument are relevant to and representative of the targeted construct for a particular assessment purpose [18]. Unlike other forms of validity that focus on test scores, content validity evaluates the actual content of the instrument itself [18]. This distinction is particularly important when measuring complex, multi-dimensional constructs common in healthcare research and pharmaceutical development.

Four essential components comprise content validity:

Domain definition: How the concept measured by an instrument is operationally defined
Domain representation: The extent to which instrument items cover the entire content domain
Domain relevance: How pertinent each item is to the target construct
Appropriateness of test construction: The technical quality of item development and formatting [18]

Content Validity Index (CVI) and Its Components

The CVI methodology quantifies content validity through two primary levels of analysis:

Item-Level CVI (I-CVI): The proportion of content experts giving a item a relevance rating of 3 or 4 on a 4-point Likert scale [2] [19]
Scale-Level CVI (S-CVI): The overall validity of the entire instrument, calculated through two approaches:
- S-CVI/Ave: The average of all I-CVIs for all items on the scale [2]
- S-CVI/UA: The proportion of items on an instrument that achieve a relevance rating of 3 or 4 by all experts [2]

Table 1: Content Validity Index Types and Interpretation

Validity Index	Definition	Calculation Method	Acceptability Threshold
I-CVI	Content validity index for individual items	Number of experts rating 3-4 / Total number of experts	≥0.78 for 6+ experts; ≥0.83 for 3-5 experts [3]
S-CVI/Ave	Average of all I-CVIs	Sum of all I-CVIs / Total number of items	≥0.90 for excellent validity [18]
S-CVI/UA	Universal agreement on all items	Number of items rated 3-4 by all experts / Total items	≥0.80 for adequate validity [2]

Materials and Research Reagents

Essential Research Materials

Table 2: Essential Materials for CVI Assessment

Material/Resource	Specification	Purpose/Function
Expert Panel	3-10+ content experts with domain knowledge	Provide relevance ratings based on subject matter expertise
Rating Instrument	4-point Likert scale (1=not relevant to 4=highly relevant)	Collect expert judgments on item relevance
Data Collection Tool	Electronic survey platform or paper forms	Systematically gather expert ratings
Statistical Software	Microsoft Excel, SPSS, R	Calculate CVI indices and analyze results
Validation Instrument	Questionnaire, scale, or assessment tool	Target of the content validity evaluation

Expert Panel Selection and Composition

The quality of CVI assessment depends heavily on appropriate expert panel selection. For instrument development in pharmaceutical and clinical research, experts should possess:

Advanced qualifications (PhD, MD, or equivalent) in the relevant field
Minimum of 5 years of experience in the target domain [2]
Research or clinical expertise specifically related to the construct being measured
Diversity of perspectives to minimize bias

The number of experts should balance practical constraints with methodological rigor. While some researchers recommend 2-3 experts for initial screening [2], others suggest 5-10 experts for robust validation [3]. For high-stakes instruments in drug development, larger panels (8-12 experts) may be warranted to ensure comprehensive evaluation.

Experimental Protocol

The following diagram illustrates the complete CVI calculation workflow from expert rating to final validity determination:

Step-by-Step Calculation Methodology

Step 1: Setting Up the Data Structure

Create a structured data table in Excel or statistical software with the following organization:

Rows: Individual items from the instrument being validated
Columns: Expert ratings using the 4-point Likert scale
Rating Scale:
- 1 = Not relevant
- 2 = Somewhat relevant
- 3 = Quite relevant
- 4 = Highly relevant [2] [19]

Table 3: Example Data Structure for Expert Ratings

Item Number	Expert 1	Expert 2	Expert 3	Expert 4
Item 1	4	3	4	3
Item 2	2	3	1	2
Item 3	3	4	4	4
Item 4	4	4	3	4

Step 2: Converting Likert Scale Ratings to Binary Values

Transform the 4-point Likert scale ratings into dichotomous values to facilitate CVI calculation:

Ratings of 3 ("Quite relevant") or 4 ("Highly relevant") = 1 (Valid)
Ratings of 1 ("Not relevant") or 2 ("Somewhat relevant") = 0 (Not valid) [2]

Excel Implementation:

Position cursor in first cell of binary conversion area
Enter formula: =IF(B2>=3,1,0) where B2 contains the first expert rating
Drag formula horizontally to apply to all expert columns
Drag formula vertically to apply to all items [2]

Table 4: Binary Conversion of Expert Ratings

Item Number	Expert 1 Binary	Expert 2 Binary	Expert 3 Binary	Expert 4 Binary
Item 1	1	1	1	1
Item 2	0	1	0	0
Item 3	1	1	1	1
Item 4	1	1	1	1

Step 3: Counting Expert Agreement Per Item

Calculate the number of experts who rated each item as valid (score of 1):

Excel Implementation:

Use COUNTIF function: =COUNTIF(H2:J2,">=1")
Where H2:J2 represents the range of binary scores for a single item [2]

Step 4: Calculating Item-Level CVI (I-CVI)

Compute the I-CVI for each item by dividing the number of experts who agreed the item was valid by the total number of experts:

Formula: I-CVI = Number of experts rating item 3 or 4 / Total number of experts [2] [19]

Excel Implementation:

Formula: =K2/3 (where K2 contains agreement count, and 3 is number of experts)
For varying expert numbers, replace 3 with actual count [2]

Table 5: I-CVI Calculation and Interpretation

Item Number	Agreement Count	I-CVI Value	Interpretation
Item 1	4	1.00	Excellent
Item 2	1	0.25	Poor (needs revision)
Item 3	4	1.00	Excellent
Item 4	4	1.00	Excellent

Step 5: Categorizing Items as Valid or Invalid

Apply established thresholds to determine whether each item meets content validity standards:

For panels of 3-5 experts: I-CVI ≥ 0.8 indicates valid item [2]
For panels of 6+ experts: I-CVI ≥ 0.83 indicates valid item [3]

Excel Implementation:

Use IF function: =IF(L2>=0.8,"Valid","Invalid")
Where L2 contains the I-CVI value [2]

Step 6: Calculating Scale-Level CVI (S-CVI)

Compute the overall validity of the entire instrument using two approaches:

S-CVI/Ave (Average Approach):

Calculate average of all I-CVIs: S-CVI/Ave = Sum of all I-CVIs / Total number of items [2] [19]
Acceptability threshold: ≥0.90 indicates excellent overall content validity [18]

S-CVI/UA (Universal Agreement Approach):

Calculate proportion of items that achieve relevance ratings of 3 or 4 from all experts
Formula: S-CVI/UA = Number of items with I-CVI = 1.00 / Total number of items [2]
Acceptability threshold: ≥0.80 indicates adequate overall content validity [2]

Application in Pharmaceutical and Clinical Research

The CVI methodology has significant applications in drug development and clinical research. In one study validating pediatric pain knowledge instruments in Ghana, researchers calculated I-CVIs ranging from 0.62 to 1.00 for relevance and 0.69 to 1.00 for clarity, with S-CVI/Ave values of 0.87 and 0.89, leading to revision of 5 items and retention of 37 items [19]. This demonstrates how CVI analysis directly informs instrument refinement in multicultural research contexts.

In another example from cancer communication research, investigators developed a patient-centered communication instrument through rigorous content validation. From an initial 188 items, content validity analysis identified seven dimensions with an overall S-CVI/Ave of 0.93, indicating excellent content validity despite a low S-CVI/UA, which the authors attributed to the large number of content experts making universal agreement difficult to achieve [3].

Technical Considerations and Best Practices

Critical Values and Statistical Significance

When using the Content Validity Ratio (CVR) approach, which employs a different calculation method, researchers should consult critical value tables to determine statistical significance. The CVR formula is: CVR = (nₑ - N/2) / (N/2), where nₑ is the number of panelists indicating "essential" and N is the total number of panelists [20]. The minimum acceptable CVR values vary by panel size:

Table 6: Critical Values for Content Validity Ratio (CVR)

Number of Panelists	Minimum CVR
5	0.99
6	0.99
7	0.99
8	0.75
9	0.78
10	0.62

Common Methodological Challenges and Solutions

Expert disagreement: Use modified kappa statistics to account for chance agreement [18]
Panel size limitations: Combine quantitative CVI with qualitative expert feedback
Cross-cultural adaptation: Assess both relevance and clarity separately, as done in the Ghana pediatric pain study [19]
Complex constructs: Implement multi-stage validation with iterative item refinement

This protocol provides a comprehensive framework for calculating Content Validity Index through systematic transformation of Likert scale ratings to binary scoring. The step-by-step methodology enables researchers in pharmaceutical development and clinical research to quantitatively assess whether their instruments adequately measure target constructs. Proper implementation of CVI analysis strengthens instrument development, enhances measurement validity, and ultimately contributes to more reliable research outcomes in drug development and healthcare assessment.

By following this standardized approach, researchers can generate robust validity evidence for their instruments, meeting the rigorous methodological standards required in regulatory submissions, clinical trial endpoints, and patient-reported outcome measures.

In research disciplines, including pharmaceutical sciences and clinical outcome assessment development, the Content Validity Index (CVI) is a crucial psychometric measure used to quantify the degree to which a survey or assessment instrument adequately represents the construct it is intended to measure [21]. Establishing content validity provides the foundational evidence that scale items are relevant and representative of the target domain, thereby ensuring that subsequent data collection yields meaningful and interpretable results [2] [21]. The process systematically captures expert judgment, transforming qualitative feedback into quantitative metrics suitable for rigorous scientific evaluation. This article provides researchers with detailed application notes and protocols for calculating CVI using Microsoft Excel, enhancing efficiency, reproducibility, and accuracy in instrument development.

Core CVI Concepts and Quantitative Standards

The evaluation of content validity operates at two primary levels: the Item-level CVI (I-CVI), which assesses individual items, and the Scale-level CVI (S-CVI), which evaluates the entire instrument [2] [21]. The calculation of these indices relies on expert ratings of item relevance, typically collected using a 4-point Likert scale (e.g., 1 = Not relevant, 2 = Somewhat relevant, 3 = Quite relevant, 4 = Highly relevant) [2]. Ratings are subsequently dichotomized, with ratings of 3 and 4 considered "relevant," and 1 and 2 considered "not relevant" for validity calculations [2].

Acceptability thresholds for CVI values are not arbitrary but are based on established scientific consensus and adjust for the number of experts involved [2]. The following table summarizes the key CVI indices and their corresponding benchmarks.

Table 1: Key Content Validity Indices and Acceptability Thresholds

Index Name	Definition	Calculation	Acceptability Standard
Item-CVI (I-CVI)	Proportion of experts agreeing on an item's relevance [21].	`Number of experts rating item 3 or 4 / Total number of experts`	- 3-5 experts: 1.00 [2]- 6-10 experts: ≥ 0.78 [21]- ≥ 9 experts: ≥ 0.78 [2]
S-CVI/Ave	The average of all I-CVI scores for items on the scale [2] [21].	`Sum of all I-CVIs / Total number of items`	≥ 0.90 is considered excellent [21].
S-CVI/UA(Universal Agreement)	The proportion of items on the scale that achieved a relevance rating of 3 or 4 from all experts [2] [21].	`Number of items with I-CVI = 1.00 / Total number of items`	A more conservative measure; no universal threshold, but higher values indicate stronger agreement.
Content Validity Ratio (CVR)	Assesses whether an item is deemed "essential" [22] [23].	`(n_e - N/2) / (N/2)`where `n_e` = number of experts rating "essential," `N` = total experts [22].	Must exceed a critical value based on the number of experts (see Table 2) [22].

Table 2: Lawshe's CVR Critical Values Table [22]

Number of Experts	Minimum CVR Value
5	0.99
6	0.99
7	0.99
8	0.75
9	0.78
10	0.62
15	0.49
20	0.42
30	0.33
39	0.33

Experimental Protocol for CVI Assessment

Stage 1: Expert Panel Preparation and Data Collection

Define the Construct and Domain: Clearly articulate the theoretical construct and its boundaries. This operational definition guides the entire validation process [21].
Assemble an Expert Panel: Engage a panel of 5-10 Subject Matter Experts (SMEs). Experts should have extensive experience in the field related to the construct [22] [23]. Document their credentials and expertise.
Develop the Rating Instrument: Create a survey where experts rate each item on a 4-point relevance scale (1=Not relevant, 2=Somewhat relevant, 3=Quite relevant, 4=Highly relevant) [2]. Include space for qualitative feedback on item clarity, wording, and comprehensiveness.
Collect Ratings: Distribute the rating instrument to the expert panel. Ensure ratings are completed independently to avoid conferring bias.

Stage 2: Excel Data Setup and CVI Computation

Data Entry: In a new Excel workbook, set up a table where rows represent questionnaire items and columns represent expert ratings. Enter the raw 4-point scale ratings.
Dichotomize Ratings: Convert the 4-point scale ratings to binary values (1 for relevant, 0 for not relevant).
- Excel Formula: In the first cell of a new "Expert 1 Binary" column, enter: =IF(Original_Rating_Cell>=3,1,0). Drag the fill handle to apply this formula to all items and experts [2].
Calculate I-CVI:
- Step 1 - Count Agreements: Use COUNTIF to sum the number of experts who rated each item as relevant.
  - Excel Formula: =COUNTIF(Binary_Range, ">=1") where Binary_Range is the range of binary cells for a single item [2].
- Step 2 - Compute Proportion: Divide the agreement count by the total number of experts.
  - Excel Formula: =Agreement_Count_Cell / Total_Number_of_Experts [2].
Categorize Item Validity: Flag items as "Valid" or "Invalid" based on the I-CVI threshold.
- Excel Formula: =IF(I-CVI_Cell>=0.8, "Valid", "Invalid") [2]. The 0.8 threshold can be adjusted based on the number of experts (see Table 1).
Calculate S-CVI/Ave: Compute the average of all I-CVI values.
- Excel Formula: =AVERAGE(I-CVI_Range) [2].
Calculate S-CVI/UA:
- Step 1 - Flag Universal Agreement: For each item, check if all experts rated it as relevant.
  - Excel Formula: =IF(SUM(Binary_Range)=Total_Number_of_Experts, 1, 0) [2].
- Step 2 - Compute Proportion: Sum the universal agreement flags and divide by the total number of items.
  - Excel Formula: =SUM(UA_Flag_Range) / Total_Number_of_Items.

The following workflow diagram illustrates the complete CVI computation protocol in Excel.

For researchers executing a CVI study, the key "reagents" are not only computational tools but also the human and methodological components.

Table 3: Essential Research Reagents for CVI Studies

Tool/Resource	Function/Role in CVI Protocol
Subject Matter Experts (SMEs)	Provide critical judgments on item relevance and representativeness. They are the primary source of validation data [22] [23].
4-Point Relevance Scale	The standardized metric (1-4) for collecting expert judgments on each item, which is later dichotomized for analysis [2].
Microsoft Excel	The computational platform for data organization, dichotomization, and calculation of I-CVI, S-CVI/Ave, and S-CVI/UA using built-in functions [2].
Structured Rating Form	The instrument (e.g., digital survey) used to present items to experts and collect their ratings in a consistent, organized manner.
CVI Threshold Tables	Reference standards (e.g., Lawshe's Table, Polit & Beck thresholds) used to make objective retain/reject decisions for items and the overall scale [2] [22].

Manually computing CVI is prone to error and inconsistency. Leveraging Microsoft Excel with a structured protocol, as outlined in this article, provides a systematic, efficient, and reproducible method for establishing content validity. By implementing the specific formulas and workflows—such as IF for dichotomization, COUNTIF for aggregating expert agreement, and AVERAGE for calculating scale-level indices—researchers can ensure the rigor and defensibility of their survey instruments. This robust quantitative foundation is essential for developing high-quality measures that yield trustworthy data in critical research and development fields.

Content validity evidence is a critical component in the development of surveys and measurement instruments, ensuring that items adequately represent the construct domain being measured. The Content Validity Index (CVI) has emerged as the most widely utilized quantitative method for evaluating this psychometric property [24]. The CVI system operates at two distinct levels: the Item-Level CVI (I-CVI), which assesses individual items, and the Scale-Level CVI (S-CVI), which evaluates the entire instrument [25] [24]. Within S-CVI, two primary approaches exist: S-CVI/UA (Universal Agreement) and S-CVI/Ave (Average) [2]. Proper interpretation of these metrics is essential for researchers, scientists, and drug development professionals to make informed decisions about instrument refinement and validation. This application note provides a comprehensive framework for interpreting I-CVI and S-CVI/Ave scores within the context of survey development research.

Quantitative Interpretation Guidelines

Established Standards and Thresholds

The table below summarizes the widely accepted quantitative standards for interpreting I-CVI and S-CVI/Ave scores in instrument development:

Table 1: Interpretation Guidelines for CVI Metrics

Metric	Score Range	Interpretation	Action Implied	Key References
I-CVI	≥ 0.78	Excellent	Item should be retained	[24]
	0.70 - 0.78	Requires revision	Item needs modification	[25]
	< 0.70	Unacceptable	Item should be eliminated	[25]
S-CVI/Ave	≥ 0.90	Excellent	Scale has excellent content validity	[24]
	0.80 - 0.89	Good	May require minor revisions	[2]
	< 0.80	Questionable	Requires significant revision	[2]

These thresholds provide a systematic approach for evaluating both individual items and the overall instrument. For newly developed instruments, a more conservative approach is often recommended, with I-CVI values of ≥ 0.80 considered necessary to confirm that items possess high, clear, and relevant content validity [2].

Comparative Analysis from Recent Studies

Recent validation studies demonstrate the practical application of these standards across various research domains:

Table 2: CVI Values in Recent Instrument Validation Studies

Instrument	Field	I-CVI Range	S-CVI/Ave	Expert Panel Size	Reference
Personalized Exercise Questionnaire (PEQ)	Musculoskeletal Disorders	0.50 - 1.00	0.91	42	[25]
Musculoskeletal Self-Management Questionnaire (MSK-SMQ)	Persistent MSK Conditions	0.91 - 1.00	0.96	91 (3 panels)	[26]
Tele-Primary Care Oral Health CIS	Digital Health	N/R	0.90	10	[27]

The variability in I-CVI ranges observed in these studies highlights the importance of both item-level and scale-level analysis. For instance, the PEQ development study retained some items with I-CVI as low as 0.50 while achieving an acceptable S-CVI/Ave of 0.91, demonstrating how instruments with varying item-level performance can still achieve adequate overall content validity [25].

Experimental Protocols for CVI Evaluation

Judgmental Validation Protocol

The judgmental validation process involves rigorous evaluation by content experts through the following methodology:

Expert Panel Selection: Recruit 3-10 subject matter experts with demonstrated expertise in the construct domain [25] [2]. For highly specialized domains (e.g., drug development), include professionals with diverse but relevant backgrounds (clinicians, researchers, methodologists).
Rating Procedure: Present experts with the instrument items and a 4-point Likert scale for rating item relevance: 1 = "not relevant," 2 = "somewhat relevant," 3 = "quite relevant," and 4 = "highly relevant" [2] [27].
Data Collection: Utilize structured forms that allow experts to rate each item and provide qualitative feedback on clarity, wording, and appropriateness [25].
Binary Conversion: Convert Likert ratings to binary values (0 or 1), where ratings of 3 or 4 are converted to 1 (relevant), and ratings of 1 or 2 are converted to 0 (not relevant) for CVI calculation [2].

Quantitative Calculation Protocol

The following workflow illustrates the systematic process for calculating and interpreting CVI metrics:

Diagram 1: CVI Calculation and Interpretation Workflow

I-CVI Calculation Methodology: For each item, I-CVI is calculated as the number of experts giving a rating of "very relevant" (or 3-4 on the Likert scale) divided by the total number of experts [25]. The formula is expressed as:

I-CVI = (Number of experts rating item as 3 or 4) / (Total number of experts) [2]

S-CVI/Ave Calculation Methodology: S-CVI/Ave is computed as the average of all I-CVI values in the instrument [24] [2]. The formula is expressed as:

S-CVI/Ave = Σ(I-CVI) / (Number of items) [2]

Excel-Based Computational Protocol

For researchers with minimal statistical expertise, Microsoft Excel provides an accessible platform for CVI calculation:

Data Setup: Create a table with items as rows and expert ratings as columns [2].
Binary Conversion Formula: Use Excel formula =IF(B2>=3,1,0) to convert Likert ratings to binary values [2].
I-CVI Calculation: Apply formula =COUNTIF(H2:J2,">=1")/3 to calculate I-CVI for each item [2].
S-CVI/Ave Computation: Use =AVERAGE(L2:L11) to calculate the scale-level index from all I-CVI values [2].
Categorization: Implement formula =IF(L2>=0.8,"Valid","Invalid") to automatically flag items requiring revision [2].

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Reagents for Content Validation Research

Research Reagent	Function/Application	Implementation Considerations
Expert Panel	Provides judgmental evidence of content relevance	Select 3-10 experts with verified domain expertise; ensure diversity of perspectives [25]
Structured Rating Form	Standardizes expert evaluation process	Include 4-point Likert scale for relevance ratings; provide space for qualitative feedback [2]
CVI Calculation Template	Automates quantitative validity computation	Excel-based templates with pre-programmed formulas enhance accuracy and efficiency [2]
Content Validity Index (CVI)	Quantifies item and scale-level validity	Calculate both I-CVI and S-CVI/Ave for comprehensive assessment [24]
Qualitative Feedback Framework	Captures expert insights for item refinement	Implement cognitive interviewing methods to understand expert interpretations [25]

Advanced Interpretation Considerations

Integration of Quantitative and Qualitative Evidence

Sophisticated interpretation of CVI results requires integrating quantitative metrics with qualitative insights:

Cognitive Interviewing: Supplement CVI scores with cognitive interviews to understand how experts interpret items and identify potential issues with wording or clarity [25].
Iterative Refinement: Use the combination of quantitative scores and qualitative feedback in an iterative refinement process, particularly for items with I-CVI scores between 0.70-0.78 [25].
Domain Analysis: Examine patterns across domains or subscales to identify systematic content gaps, even when overall S-CVI/Ave meets thresholds [25].

For advanced applications, researchers should consider:

Chance Agreement Adjustment: Compute modified kappa statistics (K*) to adjust I-CVI for chance agreement, particularly with smaller expert panels [24].
Panel Composition Effects: Recognize that S-CVI/UA (universal agreement) tends to produce more conservative estimates than S-CVI/Ave, especially with larger panels [25].
Essentiality Assessment: Complement relevance assessment with essentiality ratings using the Content Validity Ratio (CVR) for more comprehensive content validation [26].

Proper interpretation of I-CVI and S-CVI/Ave scores requires both adherence to established psychometric thresholds and thoughtful consideration of contextual factors in instrument development. The protocols and guidelines presented in this application note provide researchers, scientists, and drug development professionals with a systematic framework for evaluating content validity evidence. By implementing these methodologies and utilizing the provided research reagents, professionals can enhance the rigor of their survey development processes and ensure their instruments adequately represent the constructs of interest. The integration of quantitative metrics with qualitative insights remains paramount for sophisticated content validity assessment in research and applied settings.

The Content Validity Index (CVI) is a critical quantitative measure in psychometric instrument development, ensuring questionnaire items accurately reflect the construct being measured [2]. This case study details the application of CVI methodology in developing and validating the Drug Clinical Trial Participation Feelings Questionnaire (DCTPFQ) for cancer patients [28]. The process demonstrates rigorous content validation within a broader research framework on survey development, providing a model for researchers and drug development professionals.

Methodological Framework

Theoretical Foundation and Item Generation

The DCTPFQ was developed using a structured, multi-phase methodology combining qualitative and quantitative approaches [28]. The initial phase established a robust theoretical foundation using Meleis's transitions theory and the Roper-Logan-Tierney model to conceptualize the patient experience during clinical trial participation [28].

Qualitative Data Collection: Researchers conducted semi-structured, open-ended interviews with 10 cancer patients from a clinical trial center to capture authentic experiences [28]. Interviews explored four domains: participative cognition, healthcare resources, subjective experience, and support from relatives and friends [28].
Initial Item Pool: Through literature review, theoretical framework application, and patient interviews, researchers generated 44 initial items for the questionnaire [28].

Content Validation Procedure

The content validation followed a rigorous multi-step process consistent with established CVI methodology [2] [29].

Table: Content Validation Procedure for DCTPFQ

Step	Procedure	Participants	Output
1. Delphi Expert Consultation	Expert rating of item relevance using 4-point Likert scale	Panel of content experts	Initial item reduction and refinement
2. Pilot Testing	Preliminary testing of questionnaire	Target patient population	Assessment of readability and comprehension
3. First CVI Assessment	Calculation of I-CVI and S-CVI/Ave	Expert panel	Identification of non-valid items (I-CVI < 0.78)
4. Questionnaire Modification	Removal, addition, and modification of items	Research team	Improved questionnaire version
5. Second CVI Assessment	Re-calculation of CVI values	Same expert panel	Final validation of all items

CVI Calculation Protocol

The CVI calculation followed established quantitative methods essential for content validity assessment [2] [29]:

Expert Panel: A panel of content experts rated each item's relevance using a 4-point Likert scale (1 = not relevant, 2 = somewhat relevant, 3 = relevant, 4 = very relevant) [29].
Item-Level CVI (I-CVI): Calculated for each item by dividing the number of experts rating it 3 or 4 by the total number of experts [2] [29]. The accepted threshold for item retention was I-CVI ≥ 0.78 [29].
Scale-Level CVI (S-CVI/Ave): Computed as the average of all I-CVI values, with the acceptable standard set at ≥ 0.90 [29].

For studies with 3-5 experts, some methodologies recommend an I-CVI of 1.00, while with larger panels (≥6 experts), the threshold is typically ≥ 0.83 [2].

Results and Quantitative Analysis

Content Validity Outcomes

The application of CVI methodology to the DCTPFQ yielded strong content validity evidence:

Table: CVI Results for DCTPFQ Development

Validation Metric	Initial Pool	After Delphi & Pilot	Final Questionnaire
Number of Items	44	36	21
I-CVI Range	Not specified	Not specified	All items ≥ 0.78
S-CVI/Ave	Not specified	Not specified	0.934
Test-Retest Reliability	-	-	0.840
Cronbach's Alpha	-	-	0.934

The validation process refined the questionnaire from 44 initial items to a final 21-item instrument structured across four key factors: cognitive engagement, subjective experience, medical resources, and relatives and friends' support [28]. The final questionnaire demonstrated excellent psychometric properties with a Cronbach's alpha of 0.934 and test-retest reliability of 0.840 [28].

Construct Validity Correlations

The DCTPFQ showed significant correlations with established measures, confirming construct validity:

Fear of Progression Questionnaire—short form (r = 0.731, p < 0.05) [28]
Mishel's Uncertainty in Illness Scale (r = 0.714, p < 0.05) [28]

Experimental Protocols

Workflow Visualization

CVI Calculation Protocol

The Scientist's Toolkit

Table: Essential Research Reagents for CVI Survey Validation

Tool/Resource	Function in CVI Validation	Application Example
Expert Panel	Content experts with domain-specific knowledge to assess item relevance	12 orthodontic specialists validated functional appliance questionnaire [29]
4-Point Likert Scale	Rating system for expert evaluation of item relevance (1=not relevant; 4=very relevant)	Used in DCTPFQ development for expert ratings [28]
I-CVI Calculator	Computational tool to calculate Item-Level Content Validity Index	Excel formulas (=COUNTIF, =AVERAGE) automate CVI calculation [2]
S-CVI/Ave Calculator	Tool to compute Scale-Level Content Validity Index as average of I-CVI values	Determines overall questionnaire content validity [2] [29]
Delphi Method Protocol	Structured communication technique for gathering expert consensus	Multiple rounds of expert consultation refine questionnaire items [28]
Statistical Software (SPSS)	Analyzes reliability and validity metrics beyond content validation	Cronbach's alpha calculation for internal consistency [29]

This case study demonstrates the systematic application of CVI methodology in developing the DCTPFQ, highlighting its essential role in ensuring content validity for clinical trial assessment tools. The rigorous process—from theoretical foundation and expert validation to quantitative CVI assessment—provides a validated, reliable instrument measuring cancer patients' clinical trial participation experiences across four key dimensions. This methodology offers researchers and drug development professionals a replicable framework for developing psychometrically sound questionnaires, ultimately contributing to improved understanding of patient experiences in clinical trials and supporting the development of more patient-centric drug development practices.

Navigating Common Challenges and Enhancing CVI Outcomes

Within the rigorous framework of content validity index (CVI) survey development research, the Item-Level Content Validity Index (I-CVI) serves as a fundamental quantitative metric for evaluating individual instrument items. Defined as the proportion of experts who rate an item as quite relevant or very relevant (typically ratings of 3 or 4 on a four-point Likert scale), the I-CVI provides critical insight into the relevance and representativeness of each item in measuring the intended construct [21] [30]. Establishing robust content validity is a necessary condition for the accuracy and credibility of research findings, particularly in high-stakes fields like drug development where measurement error can have significant consequences [30].

The acceptable threshold for I-CVI is not absolute but varies based on the number of content experts involved in the validation process. For newly developed instruments, a common standard requires an I-CVI value of ≥ 0.8 to confirm that items possess high, clear, and relevant content validity [15] [2]. However, more stringent requirements exist, with some methodologies recommending a perfect score of 1.0 when working with smaller panels of 3-5 experts [2] [21]. When items fall below these critical thresholds, researchers must engage in systematic revision and refinement to strengthen the instrument's conceptual foundation and measurement properties, thereby ensuring the tool adequately represents the target domain [21] [23].

Assessment and Diagnostic Framework

Quantitative Diagnostic Thresholds

A comprehensive diagnostic approach to low I-CVI begins with understanding the statistical thresholds that determine item acceptability. The following table synthesizes expert recommendations for I-CVI cut-off scores based on varying numbers of content experts:

Table 1: I-CVI Acceptability Thresholds Based on Number of Experts

Number of Experts	Acceptable I-CVI Value	Source of Recommendation
2	At least 0.8	Davis (1992)
3 to 5	Should be 1	Polit & Beck (2006), Polit et al., (2007)
At least 6	At least 0.83	Polit & Beck (2006), Polit et al., (2007)
6 to 8	At least 0.83	Lynn (1986)
At least 9	At least 0.78	Lynn (1986)

Adapted from research on content validation methodologies [2].

Beyond these universal agreement standards, researchers should consider the practical significance of marginal failures. For instance, an I-CVI of 0.75 with four experts (where three rated the item as relevant) may require different refinement strategies than an item with an I-CVI of 0.2 with five experts (where only one expert deemed it relevant) [2]. This diagnostic phase should also include examination of qualitative expert feedback, which often provides crucial insights into the conceptual or methodological weaknesses contributing to low ratings [21] [30].

Qualitative Feedback Analysis

The quantitative I-CVI score identifies problematic items, but qualitative feedback from subject matter experts (SMEs) reveals the underlying causes. Content experts typically provide comments regarding an item's clarity, specificity, relevance, and representation of the target construct [21] [23]. Systematic analysis of this feedback should categorize concerns into thematic areas such as terminology issues, conceptual misalignment, response format problems, or contextual inappropriateness. This analysis forms the foundation for targeted revision strategies outlined in the following section [30].

Systematic Revision Strategies

Strategic Approaches for Item Improvement

When addressing low I-CVI scores, researchers should implement a structured approach to item revision that corresponds to the specific deficiencies identified through expert feedback. The following strategies have demonstrated efficacy in improving content validity across research contexts:

Conceptual Realignment: When experts indicate an item does not adequately reflect the target construct, revisit the theoretical foundation and domain definition [21]. Reframe the item to more directly capture the essential attributes of the construct, ensuring it aligns with the operational definition established during domain identification [21].
Linguistic Precision Enhancement: Modify ambiguous terminology, eliminate jargon, and simplify complex syntax that may hinder consistent interpretation [21]. Experts recommend ensuring items are "simple, unambiguous" and "follow normal conversation" while still maintaining scientific rigor [21].
Contextual Adaptation: Tailor items to the specific experiences and characteristics of the target population [21]. As emphasized in scale development guidelines, items "should be able to capture the lived experience of the target population" to enhance relevance and validity [21].
Response Scale Optimization: Evaluate whether the response format (e.g., Likert scale, dichotomous, frequency) appropriately captures the intended dimension of measurement [21]. Research indicates that "items with at least five anchor points are more reliable" than those with fewer response options [21].
Content Expansion or Reduction: For items lacking comprehensiveness, consider dividing complex items into multiple focused questions. Conversely, consolidate overlapping items that may cause expert fatigue or redundancy concerns [30].

Expert Re-engagement Protocol

Following initial revisions, researchers should implement a structured re-engagement process with content experts to validate improvement strategies:

Table 2: Expert Re-engagement Protocol for Revised Items

Phase	Action	Deliverable
1. Feedback Synthesis	Compile and categorize all expert comments for each low-performing item	Item-specific revision roadmap
2. Draft Revision	Implement linguistic, conceptual, and structural modifications based on expert feedback	Revised item pool with documentation of changes
3. Expert Communication	Provide experts with a summary of revisions made in response to their feedback	Transparency report linking comments to modifications
4. Limited Re-Rating	Request targeted re-evaluation of previously problematic items	Post-revision I-CVI scores
5. Improvement Quantification	Calculate delta between original and revised I-CVI values	Quantitative evidence of improvement

This protocol emphasizes transparency and documentation, creating an audit trail that strengthens the validity argument for the refined instrument [21] [30].

Experimental Protocol for I-CVI Improvement

The following experimental protocol provides a standardized workflow for addressing low I-CVI scores, from initial identification through final validation:

Diagram 1: I-CVI Improvement Workflow

Materials and Reagent Solutions

Table 3: Essential Research Reagents for I-CVI Improvement Studies

Research Reagent	Function/Application	Implementation Considerations
Subject Matter Expert (SME) Panel	Provides qualitative and quantitative evaluation of item relevance	Select 3-10 experts with minimum 5 years field experience; ensure representation across relevant subdisciplines [2] [21]
Four-Point Likert Scale	Standardized rating system for item relevance assessment	Use anchors: 1=Not relevant; 2=Somewhat relevant; 3=Quite relevant; 4=Highly relevant; prevents neutral responses [2] [30]
Digital Analysis Tool (Excel/SPSS)	Quantitative calculation of I-CVI values and expert agreement	Implement formulas: =COUNTIF() for agreement counting; =AVERAGE() for I-CVI computation; enables efficient re-calculation post-revision [15] [2]
Structured Feedback Form	Captures qualitative expert comments on item deficiencies	Include dedicated fields for: terminology issues, conceptual concerns, contextual relevance, and improvement suggestions [21] [30]
Cognitive Interview Protocol	Elicits target population interpretation of revised items	Identify misunderstanding patterns, emotional reactions, and contextual applicability before expert re-engagement [21]
Revision Documentation Template	Tracks changes from original to revised items	Creates audit trail linking specific expert feedback to implemented modifications [21]

Validation and Reporting Standards

Post-Revision Validation Methodology

Following item refinement, researchers must implement a rigorous validation protocol to demonstrate improvement in content validity. This involves both quantitative reassessment and qualitative validation:

Quantitative Re-evaluation: Recalculate I-CVI for all revised items using the same expert panel or an independent cohort [2]. Apply the same statistical thresholds for acceptability while documenting improvement magnitude. Calculate both I-CVI and Scale-Level CVI (S-CVI) using average and universal agreement methods to present comprehensive validity evidence [21] [30].
Comparative Statistical Analysis: Perform pre-post analyses to determine the statistical significance of I-CVI improvements using appropriate tests (e.g., McNemar's test for paired proportions) [30]. Report effect sizes to communicate the practical significance of revisions.
Qualitative Validation: Synthesize expert comments on revised items to identify residual concerns or emerging issues [21]. Document how feedback was incorporated to demonstrate responsiveness to expert input.

Reporting Framework for Methodological Transparency

Comprehensive reporting of I-CVI refinement methodologies is essential for research reproducibility and scientific rigor. The following elements should be documented:

Table 4: Essential Reporting Elements for I-CVI Refinement Studies

Reporting Element	Content Description	Rationale
Initial I-CVI Values	Pre-revision scores for all items, including those below threshold	Establishes baseline measurement and identifies problem areas
Expert Qualifications	Number of experts, selection criteria, years of experience, relevant expertise	Supports credibility of content validity assessment [2] [21]
Revision Rationale	Specific expert comments that motivated each modification	Creates transparent link between feedback and revision
Post-Revision I-CVI	Updated scores for refined items with comparison to baseline	Demonstrates efficacy of refinement strategies
S-CVI Calculations	Scale-Level Content Validity Index using both averaging and universal agreement methods	Provides comprehensive instrument-level validity evidence [21]
Limitations	Constraints of revision process, including expert availability, resource limitations	Supports appropriate interpretation and identifies needs for future research

This structured approach to addressing low I-CVI scores ensures methodological rigor in survey development while providing a transparent framework for continuous instrument improvement. By implementing these systematic revision and refinement strategies, researchers in drug development and related fields can enhance the content validity of their measurement instruments, thereby strengthening the scientific evidence generated through their application.

Optimizing Expert Feedback Integration for Item Clarity and Relevance

Content validity is a fundamental component of measurement validity, ensuring that an instrument's items adequately represent all facets of the theoretical construct being measured [20] [31]. In survey development, the Content Validity Index (CVI) provides a quantitative measure of expert consensus on item relevance and clarity [3] [31]. This protocol details rigorous methodologies for integrating expert feedback to optimize item quality during content validation, a critical process for developing psychometrically sound instruments in clinical and translational research [32] [3]. The systematic approach outlined here enables researchers to minimize measurement error and enhance the clinical relevance of surveys used in drug development and health outcomes research [32].

Theoretical Framework: Content Validity in Instrument Development

Content validity assesses the degree to which elements of a measurement instrument are relevant to and representative of the targeted construct for a particular assessment purpose [31]. For constructs that cannot be measured directly—such as patient-centered communication, evidence-based practice propensity, or functional vision—achieving content validity requires systematic expert evaluation to ensure the item pool sufficiently samples the entire content domain [3] [20].

The process involves both qualitative and quantitative assessments. Qualitatively, experts evaluate item clarity, appropriateness, and comprehensiveness. Quantitatively, the Content Validity Ratio (CVR) and Content Validity Index (CVI) provide standardized metrics for evaluating individual items and the overall instrument [3] [20]. Research demonstrates that robust content validation practices reduce measurement error and enhance instrument quality, ultimately supporting more valid research conclusions [32] [3].

Quantitative Assessment Protocols

Expert Panel Recruitment and Composition

The validity of content assessment depends heavily on expert panel composition. The panel should include both content experts (professionals with research or clinical experience in the field) and lay experts (representatives from the target population) [3]. For clinical outcome assessments, this typically means including clinicians, researchers, and patient representatives.

Panel Size: While literature recommends 5-10 experts [3], larger panels (up to 20) reduce chance agreement [20]
Qualifications: Content experts should have professional or research experience with the construct; lay experts should represent the target population [3]
Training: Provide clear instructions and definitions to ensure consistent understanding of evaluation criteria [32]

Content Validity Ratio (CVR) Calculation

The Content Validity Ratio measures whether panelists consider items essential to the construct. Experts rate each item using a 3-point scale: (1) "not necessary," (2) "useful but not essential," or (3) "essential" [3] [20].

Calculate CVR using the formula: CVR = (nₑ - N/2) / (N/2) where:

nₑ = number of panelists indicating "essential"
N = total number of panelists [3] [20]

CVR values range from -1 to +1, with positive values indicating at least half the panelists consider the item essential [20]. The statistical significance of CVR values depends on panel size, with higher critical values required for smaller panels (Table 1).

Table 1: Critical Values for Content Validity Ratio (CVR)

Number of Panelists	Minimum CVR Value
5	0.99
6	0.99
7	0.99
8	0.75
9	0.78
10	0.62
11	0.59
12	0.56
20	0.42
30	0.33
40	0.29

Source: Adapted from [20]

Content Validity Index (CVI) Determination

The Content Validity Index provides an overall measure of instrument content validity. Calculate CVI using two approaches:

Item-Level CVI (I-CVI): Proportion of experts giving a rating of 3 ("essential") for each item [31]
Scale-Level CVI (S-CVI): Average of all I-CVIs across the instrument [3] [31]

For rigorous instrument development, S-CVI should exceed 0.80 [31]. In practice, an I-CVI of 1.00 is ideal for each item, but values below the critical CVR threshold (Table 1) indicate items requiring revision or elimination [20].

Focus Group Methodology

Structured focus groups with experts facilitate collective item refinement. The process should include:

Pre-Session Preparation: Distribute items with study aims 1 week before session [32]
Structured Format:
- Welcome and overview of study
- Objectives and instructions with rewriting examples
- Breakout rooms for small group item review (5 minutes per item)
- Collective item rewriting using shared screen [32]
Moderator Guidance: Probe with specific questions about wording clarity, difficulty, and comprehensiveness [32]
Linguistic Equivalence: For bilingual instruments, discuss English and French versions simultaneously to ensure conceptual and semantic equivalence [32]

Cognitive Interviewing Techniques

Cognitive interviews identify problematic items through verbal probing with potential respondents. The process involves:

Participant Recruitment: Native speakers representing target population; continue until no changes are suggested by three consecutive participants [32]
Interview Structure: 15-30 minute sessions conducted via telephone or videoconference [32]
Verbal Probing: Ask specific questions about interpretation, clarity, and suggested improvements (Table 2) [32]
Iterative Revision: Review comments after each interview, revise problematic items, and test revisions with subsequent participants [32]

Table 2: Cognitive Interview Probing Questions for Item Clarity Assessment

Assessment Area	Sample Probing Questions
Item Comprehension	"What does this statement mean to you?""In your own words, what do you think this statement is saying?"
Clarity Evaluation	"Were these statements easy to understand?""Are there any words that are not clear or do not work well?"
Improvement Suggestions	"How would you change the wording to make it clearer?"
Response Options	"What do you think about these response options?""How would you make the options clearer?"
Overall Impression	"Do you have any comments on the measure as a whole?""Is there anything that you would change?"

Source: Adapted from [32]

Integrated Workflow for Expert Feedback Integration

The following diagram illustrates the sequential process for integrating quantitative and qualitative expert feedback to optimize item clarity and relevance:

Expert Feedback Integration Workflow

Research Reagent Solutions

Table 3: Essential Materials for Content Validation Studies

Research Reagent	Function/Purpose	Implementation Example
Expert Panel Recruitment Materials	Identify and enroll qualified content experts	Purposive sampling from professional networks; social media recruitment for lay experts [32]
3-Point Essentiality Rating Scale	Quantitative assessment of item necessity	"Not necessary, useful but not essential, essential" [3] [20]
CVR/CVI Calculation Spreadsheet	Quantitative analysis of expert ratings	Automated templates for calculating item CVR and scale-level CVI [3] [20]
Structured Focus Group Guide	Facilitate collective item refinement	Moderator guide with breakout room activities and probing questions [32]
Cognitive Interview Protocol	Elicit participant interpretation of items	Verbal probing questions about comprehension and suggested improvements [32]
Digital Collaboration Platform	Enable remote expert collaboration	Zoom for virtual focus groups; shared documents for collective rewriting [32]

Case Application: Rehabilitation Research Evidence Integration Index

A recent study developing the Propensity to Integrate Research Evidence into Clinical Decision-Making Index (PIRE-CDMI) demonstrates this integrated approach [32]. Researchers conducted:

Focus Group: Seven participants (occupational therapists, physical therapists, researchers) reviewed prototype items for clarity and consistency
Cognitive Interviews: 27 clinicians elaborated on item interpretation and appropriateness
Iterative Revision: The index underwent 12 revisions with substantial modifications to "use of research evidence" and "attitudes" items [32]

This process enhanced clinical relevance and reduced measurement error through systematic expert feedback integration, ultimately producing a brief, multidimensional index with improved content validity [32].

Optimizing expert feedback integration through structured protocols ensures the development of content-valid instruments with strong psychometric properties. The combined quantitative (CVR/CVI) and qualitative (focus groups, cognitive interviews) approach provides comprehensive evidence for item clarity and relevance. This methodology is particularly crucial in clinical research and drug development, where measurement accuracy directly impacts treatment evaluation and health outcomes assessment.

In the development of surveys and research instruments, the Content Validity Index (CVI) serves as a critical quantitative measure for ensuring that items adequately represent the construct domain [3]. The process of establishing content validity fundamentally relies on expert judgment, where the composition of the expert panel directly influences the credibility and accuracy of the validation [6]. A robust CVI study requires a deliberate balance between subject matter experts who provide insights on content relevance and representativeness, and methodological experts who ensure psychometric rigor and procedural validity [33] [3]. This protocol outlines detailed methodologies for forming and managing such balanced panels within the context of CVI survey development for drug development and healthcare research.

Theoretical Foundation and Core Concepts

The Role of Expert Judgment in Content Validity

Content validity is defined as the degree to which an instrument's items comprehensively represent the target construct [3]. Unlike other forms of validity, content validity is established not through statistical analysis of scores but through systematic expert evaluation of the instrument's content [6]. This process determines whether the items adequately cover all relevant domains and dimensions of the construct, making expert judgment the cornerstone of content validation [6] [3].

The process requires panelists to evaluate how well each item reflects the intended concept, often using structured rating scales. This expert assessment provides the foundational evidence that the instrument measures what it purports to measure before proceeding to field testing and other psychometric evaluations [6].

Quantitative Metrics: Content Validity Index (CVI)

The CVI provides a quantitative measure of content validity, with two primary levels of analysis [2]:

Item-Level CVI (I-CVI): The proportion of experts agreeing on an item's relevance (e.g., ratings of 3 or 4 on a 4-point relevance scale).
Scale-Level CVI (S-CVI): The overall validity of the entire instrument, calculated either as the average of all I-CVIs (S-CVI/Ave) or as the proportion of items achieving universal agreement (S-CVI/UA) [2].

Table 1: Standard CVI Thresholds for Instrument Validation

Metric	Acceptable Threshold	Number of Experts	Key References
I-CVI	≥ 0.78	5+	Polit & Beck [2]
S-CVI/Ave	≥ 0.90	5+	Polit & Beck [2]
S-CVI/UA	≥ 0.80	5+	Polit & Beck [2]

Protocol for Balanced Expert Panel Composition

Defining Expertise Requirements

The selection of panel members should be based on objective, predefined criteria related to the research problem [33]. A balanced panel requires two distinct expertise profiles:

Subject Matter Experts (SMEs) provide domain-specific knowledge essential for evaluating content relevance and comprehensiveness. In drug development research, this includes:

Clinical specialists with direct patient care experience
Basic scientists with expertise in disease pathophysiology
Regulatory affairs professionals understanding approval requirements
Patients or patient advocates for patient-reported outcomes [3]

Methodological Experts contribute research methodology knowledge critical for instrument design:

Psychometricians with expertise in measurement theory
Survey methodologists familiar with item construction
Statisticians proficient in validation techniques
Qualitative researchers for concept exploration [33]

Panel Recruitment and Size

Panel composition requires strategic consideration of both size and diversity. While no universal standard exists, research indicates that double-digit panels of approximately 10-15 members typically provide sufficient rigor while maintaining logistical feasibility [33]. For heterogeneous panels addressing complex constructs, larger panels may be necessary to capture diverse perspectives.

Table 2: Expert Panel Composition Framework

Expertise Type	Recommended Proportion	Selection Criteria	Primary Contribution
Subject Matter Experts	60-70%	• Minimum 5 years field experience• Publications in relevant field• Clinical/practical experience	Content relevance, domain coverage, clinical utility
Methodological Experts	30-40%	• Instrument development experience• Publications on measurement• Statistical expertise	Item construction, rating quality, analytic approach
End-Users/Patients	1-2 representatives	• Lived experience with construct• Ability to provide feedback	Real-world relevance, comprehensibility

Recruitment should target experts who can actively engage throughout the multi-round Delphi process, with consideration of geographical distribution and disciplinary diversity to minimize regional and professional biases [33]. The electronic Delphi (e-Delphi) approach facilitates broader geographical representation through online survey platforms [33].

The Modified Delphi Process for CVI Studies

The modified Delphi technique provides a structured framework for achieving consensus while balancing different expert perspectives [34] [33]. This process maintains anonymity to prevent dominance by individual panelists, incorporates iterative rounds with controlled feedback, and uses predefined consensus criteria [33].

Diagram 1: CVI Expert Panel Workflow

Experimental Protocols and Methodologies

Protocol 1: Expert Rating Process

This protocol details the structured process for collecting expert ratings to compute CVI values.

Materials and Reagents:

4-Point Likert Scale Forms: Physical or digital forms with relevance scale (1=Not relevant, 2=Somewhat relevant, 3=Quite relevant, 4=Highly relevant) [2]
Item Evaluation Spreadsheet: Digital template for compiling expert ratings (Microsoft Excel or compatible software) [2]
Concept Definition Document: Detailed description of the construct being measured and its domains [6]
Demographic Questionnaire: Brief survey to document expert characteristics and expertise areas

Experimental Steps:

Preparation Phase:
- Distribute the concept definition document and preliminary item pool to all panel members
- Provide clear instructions for using the 4-point relevance scale
- Establish inter-rater reliability through training examples if needed
Rating Phase:
- Experts independently rate each item's relevance to the target construct
- Collect additional qualitative feedback on item clarity, wording, and coverage
- Allow 2-3 weeks for completion with one reminder at the midpoint
Data Compilation:
- Create a master spreadsheet with items as rows and expert ratings as columns
- Convert ordinal ratings to binary values (1-2 = 0 "Not valid"; 3-4 = 1 "Valid") using Excel formula: =IF(B2>=3,1,0) [2]
- Calculate agreement count for each item using: =COUNTIF(H2:J2,">=1") [2]
CVI Calculation:
- Compute I-CVI for each item: =K2/3 (where K2 contains agreement count) [2]
- Categorize items as "Valid" (I-CVI ≥ 0.78) or "Invalid" (I-CVI < 0.78) using: =IF(L2>=0.8,"Valid","Invalid") [2]
- Calculate S-CVI/Ave as the average of all I-CVI values
- Calculate S-CVI/UA as the proportion of items achieving I-CVI = 1.0 [2]

Protocol 2: Iterative Delphi Round Management

This protocol manages the multi-round process for refining instruments based on expert feedback.

Materials and Reagents:

Controlled Feedback Reports: Summarized group responses with individual and group statistics [33]
Modified Instrument Versions: Updated item pools incorporating previous round feedback
Consensus Criteria Documentation: Predefined thresholds for determining when additional rounds are unnecessary
Stability Assessment Tools: Statistical measures to compare between rounds

Experimental Steps:

Round 1 Administration:
- Distribute initial instrument to all panel members
- Collect relevance ratings and qualitative suggestions
- Analyze results to identify items requiring modification
Controlled Feedback Preparation:
- Prepare summary reports showing the distribution of ratings for each item
- Include anonymous qualitative comments grouped by theme
- Calculate descriptive statistics (medians, means, measures of dispersion)
- Highlight areas of disagreement for special attention [33]
Subsequent Rounds:
- Present revised items alongside their previous ratings and feedback
- Ask experts to re-rate items considering group perspectives
- Continue until predefined consensus criteria are met or stability is achieved
- Typically require 2-3 rounds for adequate consensus development [34]
Closing Criteria Evaluation:
- Predefine consensus thresholds (e.g., I-CVI ≥ 0.78 for all items)
- Assess stability between rounds using statistical measures (e.g., <5% change in I-CVI values)
- Document rationale for concluding the Delphi process [33]

The Scientist's Toolkit: Essential Research Reagents

Table 3: Key Research Reagents for CVI Studies

Tool/Reagent	Primary Function	Application Notes
4-Point Relevance Scale	Standardized expert rating of item relevance	Prevents neutral responses; forces directional assessment [2]
CVI Calculation Spreadsheet	Automated computation of validity indices	Uses Excel functions (COUNTIF, IF) to minimize manual errors [2]
Expert Demographic Form	Documentation of panelist expertise	Verifies balanced representation of subject and methodological expertise
Concept Definition Document	Detailed construct operationalization	Ensures common understanding of the domain being measured [6]
Delphi Platform	Virtual environment for iterative rounds	Facilitates anonymous rating and controlled feedback distribution [33]
Consensus Criteria Checklist	Objective standards for round termination	Prevents arbitrary closure of the validation process [33]

Analysis and Interpretation

Quantitative Analysis of CVI Results

The statistical analysis of CVI studies involves both item-level and scale-level evaluation. For each item, the I-CVI is calculated as the proportion of experts rating the item as content valid (3 or 4 on the relevance scale). The standard threshold for retaining items is I-CVI ≥ 0.78 when using 5 or more experts [2].

At the scale level, the S-CVI/Ave should achieve ≥ 0.90 for the entire instrument to be considered content valid [2]. The more stringent S-CVI/UA (universal agreement) is often lower, particularly with larger panels, and may not be feasible as a primary criterion [3].

Qualitative Analysis of Expert Feedback

Beyond quantitative ratings, the qualitative feedback from experts provides crucial insights for item refinement [33]. Thematic analysis of expert comments should identify:

Wording ambiguities that affect item clarity
Conceptual gaps in domain coverage
Terminology issues specific to the target population
Suggestions for new items to improve comprehensiveness

This qualitative component is particularly valuable for revising items that received intermediate I-CVI scores (0.5-0.75), where expert comments can guide specific improvements.

Balancing subject matter and methodological expertise in CVI expert panels is not merely a methodological consideration but a fundamental requirement for developing valid research instruments. The structured protocols outlined in this document provide a roadmap for constructing panels that leverage both content knowledge and methodological rigor. By implementing the modified Delphi process with careful attention to panel composition, rating procedures, and iterative feedback, researchers can maximize the credibility and utility of their content validation studies. The balanced integration of diverse expert perspectives ultimately produces instruments that are both scientifically sound and clinically meaningful in drug development and healthcare research.

In the rigorous field of survey development research, particularly for high-stakes environments like drug development, establishing content validity is a foundational requirement. The Content Validity Index (CVI) serves as a crucial quantitative metric for evaluating how well questionnaire items represent the intended construct, based on expert ratings of relevance and clarity [35]. However, to ensure a comprehensive assessment of a survey instrument's quality and readiness, CVI should not be used in isolation. An integrated validation approach combines CVI with other complementary methods, such as the Content Validity Ratio (CVR) and Face Validity Index (FVI), to evaluate different facets of validity—from essentiality and relevance to user comprehension and practicality [36] [35]. This multi-method framework provides a robust defense against measurement error, which is paramount when developing instruments for scientific and clinical research.

The integration of these methods allows researchers to make more informed decisions about item retention, revision, or deletion. While CVI focuses on the relevance and clarity of an item from an expert's perspective, CVR assesses its essentiality to the core construct, and FVI evaluates its clarity and comprehensibility from the end-user's viewpoint [36] [35]. This protocol details the systematic application of this integrated framework, providing researchers with a structured pathway to enhance the credibility and operational soundness of their survey instruments.

Theoretical Foundation and Key Concepts

Core Validity Metrics and Their Interrelationships

The integrated validation framework rests on three primary quantitative metrics, each serving a distinct purpose:

Content Validity Index (CVI): This index measures the proportion of experts agreeing on an item's relevance and clarity. Experts typically rate items on a 4-point scale (e.g., 1=Not relevant, 4=Highly relevant). Ratings of 3 or 4 are considered valid, and the I-CVI (Item-level CVI) is calculated for each item. The Scale-level CVI (S-CVI) can be computed as the average of all I-CVIs (S-CVI/Ave) or as universal agreement (S-CVI/UA) [36]. An I-CVI of ≥ 0.79 is generally considered acceptable [37] [35].
Content Validity Ratio (CVR): Developed by Lawshe, this ratio assesses the essentiality of an item. Experts classify items as "Not necessary," "Useful but not essential," or "Essential." The CVR is calculated using the formula: CVR = (nₑ - N/2) / (N/2), where nₑ is the number of experts rating the item as "Essential," and N is the total number of experts [36] [35]. The resulting value must exceed a critical threshold based on the number of experts [35].
Face Validity Index (FVI): This measures the target population's perception of the instrument's clarity and relevance. Participants rate items based on understandability, and the I-FVI (item-level) and S-FVI (scale-level) are calculated similarly to CVI [36].

The logical relationship between these concepts in a validation workflow can be summarized as follows:

Quantitative Standards for Validity Assessment

The table below outlines the critical thresholds for each metric, guiding researchers in their interpretation of results.

Table 1: Quantitative Standards for Integrated Validity Assessment

Metric	Purpose	Rating Scale	Calculation	Acceptance Threshold	Key Reference
I-CVI	Item relevance & clarity	4-point (1-4)	Proportion of experts rating 3 or 4	≥ 0.79	Polit & Beck (2007) [36]
S-CVI/Ave	Overall scale relevance	Average of I-CVIs	Sum of I-CVIs / Total items	≥ 0.90	Polit & Beck (2007) [36]
CVR	Item essentiality	3-point (Not necessary, Useful, Essential)	(nₑ - N/2) / (N/2)	Lawshe's Table	Lawshe (1975) [35]
I-FVI	Item comprehensibility	4-point (1-4)	Proportion of target users rating 3 or 4	≥ 0.83	Yusoff et al. (2023) [36]

For CVR, the specific thresholds based on the number of experts (N) are as follows [35]:

Table 2: Lawshe's CVR Critical Values

Number of Experts (N)	Minimum CVR
5	0.99
6	0.99
7	0.99
8	0.75
9	0.78
10	0.62
15	0.49
20	0.42
30	0.33
40	0.29

Integrated Experimental Protocol

Phase 1: Instrument Development and Expert Review

Objective: To develop the initial item pool and obtain expert evaluations for content validity assessment.

Materials and Reagents: Table 3: Research Reagent Solutions for Integrated Validation

Item	Function/Description	Application in Protocol
Expert Panel	5-20 subject matter experts with minimum 5 years' field experience [35]	Evaluate item relevance, clarity, and essentiality
Target Population Sample	10+ participants representing the intended survey audience [36]	Provide feedback on face validity (comprehensibility)
4-Point Likert Scale	Rating instrument: 1=Not relevant, 2=Somewhat relevant, 3=Quite relevant, 4=Highly relevant [2]	Standardized assessment of item relevance and clarity
3-Point Essentiality Scale	Rating instrument: Not necessary, Useful but not essential, Essential [35]	Assessment of item necessity for construct measurement
Microsoft Excel	Spreadsheet software with COUNTIF, AVERAGE functions [2]	Quantitative calculation of CVI, CVR, and FVI metrics
Digital Survey Platform	Online tool for distributing instruments to experts and participants	Efficient data collection and management

Methodology:

Item Generation: Develop an initial item pool through comprehensive literature review and focus group discussions with subject matter experts [36]. Create the first draft of the instrument.
Expert Recruitment: Convene a panel of 5-20 experts with a minimum of five years of experience in the relevant field [2] [35]. Ensure representation from all necessary disciplinary perspectives.
Expert Evaluation: Provide experts with the survey instrument and evaluation forms. Request them to rate each item on:
- Relevance using the 4-point Likert scale for CVI calculation [2]
- Essentiality using the 3-point scale for CVR calculation [35]
- Clarity using the 4-point Likert scale for CVI calculation [36]
Data Compilation: Create an Excel spreadsheet with items as rows and expert ratings as columns. The following workflow illustrates this phase:

Objective: To calculate validity metrics and identify items requiring revision or elimination.

Methodology:

Data Transformation: Convert Likert-scale ratings to binary values for CVI calculation. For the 4-point relevance scale, ratings of 3 or 4 become "1" (valid), while ratings of 1 or 2 become "0" (not valid) [2]. In Excel, use the formula: =IF(B2>=3,1,0).
I-CVI Calculation: For each item, count the number of experts who rated it as 3 or 4 (using =COUNTIF(H2:J2,">=1")), then divide by the total number of experts (e.g., =K2/3) [2].
CVR Calculation: For each item, count the number of experts rating it as "Essential." Apply Lawshe's formula: CVR = (nₑ - N/2) / (N/2) [35].
Item Decision Matrix: Apply the following decision rules to each item:
- Retain: I-CVI ≥ 0.79 AND CVR meets Lawshe's threshold
- Revise: I-CVI between 0.70-0.79 OR CVR slightly below threshold
- Eliminate: I-CVI < 0.70 OR CVR significantly below threshold [36] [37]
Scale-Level Evaluation: Calculate S-CVI/Ave by averaging all I-CVIs. A value ≥ 0.90 indicates excellent overall content validity [36].

Objective: To assess the target population's comprehension and perception of the revised instrument.

Methodology:

Participant Recruitment: Select 10+ participants from the target population using purposive sampling to ensure they represent the intended survey audience [36].
Face Validity Assessment: Administer the revised survey and ask participants to rate each item for clarity and comprehensibility using a 4-point scale (e.g., 1=Not clear, 4=Very clear).
FVI Calculation: Calculate I-FVI for each item as the proportion of participants rating it 3 or 4. Compute S-FVI/Ave as the average of all I-FVIs [36].
Final Refinement: Revise or eliminate items with I-FVI < 0.83. The following workflow illustrates the complete integrated process:

Application Case Study: MUAPHQ C-19 Development

A recent study developing the Malay version of the Understanding, Attitude, Practice and Health Literacy Questionnaire on COVID-19 (MUAPHQ C-19) exemplifies the integrated validation approach [36]. The researchers employed a two-stage process: Stage I involved item generation through literature review and expert discussions, resulting in a 54-item instrument across four domains. Stage II implemented the judgement and quantification phase using six expert panels and ten target population participants.

The study demonstrated the complementary nature of different validity metrics. While the overall content validity was excellent (S-CVI/Ave = 0.96 for relevance), the CVR analysis identified one item in the health literacy domain that fell below the acceptable threshold. Similarly, face validity assessment revealed nine items with I-FVI below the 0.83 cut-off point. This multi-method approach enabled precise instrument refinement: ten items were revised for clarity, and two items were deleted—one due to low CVR and another due to redundancy. The process culminated in a 50-item MUAPHQ C-19 (Version 3.0) ready for further psychometric testing [36].

This case study underscores the practical value of integrating CVI with CVR and FVI. Had the researchers relied solely on CVI, they might have retained items that, while relevant, were not essential or were poorly understood by the target population. The combined approach provided a more comprehensive validity assessment, leading to a more robust and precise measurement instrument.

Beyond CVI: Integrating Content Validity into a Comprehensive Validation Framework

Linking Content Validity with Construct and Criterion Validity

Within the framework of psychometric test development, establishing validity is fundamental to ensuring that an instrument accurately measures what it purports to measure. While various forms of validity exist, content validity serves as a critical foundational element upon which other validity evidence is built [38]. This article delineates the conceptual and methodological linkages between content validity, construct validity, and criterion validity, with a specific focus on the context of Content Validity Index (CVI) survey development research. For researchers, scientists, and drug development professionals, understanding these interrelationships is essential for developing robust measurement instruments that yield trustworthy and meaningful data in both clinical and research settings.

Theoretical Framework: Interrelationships of Validity Types

Validity is a unitary concept representing the degree to which evidence and theory support the interpretations of test scores. The traditional "trinitarian" view separates validity into content, criterion, and construct types, but modern understanding often positions construct validity as the overarching concept [39] [40]. Content validity is frequently considered a form of translation validity, assessing how well a construct is translated into its operationalized form [39].

The relationship between these validity types is hierarchical and sequential. Content validity provides the foundational evidence that an instrument adequately samples the domain of interest. This thorough domain coverage is a prerequisite for establishing construct validity, which examines how well the instrument represents the theoretical construct [41] [38]. Similarly, without adequate content validity, an instrument's correlation with an external criterion (criterion validity) may be spurious or misleading [42]. The following diagram illustrates this integrative relationship.

Logical Relationships Between Validity Types. This diagram shows how a theoretical construct is operationalized into a measurable instrument through content validity. The instrument's performance is then evaluated through criterion validity. Both content and criterion validity provide essential evidence for the overarching construct validity.

Comparative Analysis of Validity Types

Table 1: Definitions and Key Characteristics of Validity Types

Validity Type	Core Question	Primary Focus	Key Methodological Approach
Content Validity	Is the test fully representative of what it aims to measure? [41]	Domain coverage and item relevance [41] [38]	Expert judgment and systematic rating of items [2] [26]
Construct Validity	Does the test measure the concept that it's intended to measure? [41]	Theoretical alignment and conceptual integrity [41] [42]	Correlation with related/unrelated measures (MTMM, Factor Analysis) [42] [39]
Criterion Validity	Do the results correspond to a concrete outcome? [41]	Predictive accuracy and concurrent association [41] [42]	Correlation with a "gold standard" or future outcome [42] [40]

Quantitative Protocols for Content Validity Assessment

The Content Validity Index (CVI) is the most widely accepted quantitative method for assessing content validity. The following protocol provides a step-by-step methodology for its computation, which can be implemented using spreadsheet software like Microsoft Excel [2].

Experimental Protocol: Calculating the Content Validity Index (CVI)

Objective: To quantitatively assess the content validity of a newly developed survey instrument by calculating Item-level CVI (I-CVI) and Scale-level CVI (S-CVI).

Materials and Reagents:

Expert Panel: A minimum of 3-10 experts with extensive knowledge in the construct domain [2] [26].
Rating Instrument: A four-point Likert scale for rating item relevance (1 = Not relevant, 2 = Somewhat relevant, 3 = Quite relevant, 4 = Highly relevant) [2].
Data Analysis Tool: Microsoft Excel or statistical software.

Procedure:

Expert Rating: Provide the panel of experts with the survey items and the definition of the construct. Each expert independently rates each item for its relevance using the four-point Likert scale [2].
Data Compilation: Create a table in Excel with rows representing questionnaire items and columns representing expert ratings.
Binary Conversion: Convert the Likert scale ratings to binary values (0 or 1). Ratings of 3 or 4 are converted to "1" (Valid), while ratings of 1 or 2 are converted to "0" (Not Valid). This can be automated in Excel using the formula: =IF(B2>=3,1,0) [2].
Calculate I-CVI: For each item, count the number of experts who gave a rating of "1". The I-CVI is the proportion of experts in agreement for that item, calculated as: I-CVI = (Number of experts giving a rating of 1) / (Total number of experts). In Excel, use =COUNTIF(H2:J2,">=1")/3 for a 3-expert panel [2] [26].
Determine Item Acceptability: Based on the number of experts, establish a cut-off score for I-CVI. For a panel of 3-5 experts, an I-CVI of 1.0 is ideal. For larger panels (≥6 experts), a cut-off of 0.83 is acceptable [2]. Items below the threshold should be revised or discarded.
Calculate S-CVI: Compute the Scale-Level Content Validity Index using one of two methods:
- S-CVI/Ave (Average): The average of all I-CVI scores. This is calculated in Excel using the AVERAGE function across all I-CVI values. A value of ≥ 0.90 is considered excellent [2] [27].
- S-CVI/UA (Universal Agreement): The proportion of items that achieved a rating of "1" by all experts. This is a more stringent measure [2].

The workflow for this quantitative assessment is detailed below.

CVI Calculation Workflow. This protocol outlines the step-by-step process for quantifying content validity, from initial expert ratings to the final calculation of scale-level validity indices.

Research Reagent Solutions for Validity Assessment

Table 2: Essential Methodological Components for Validation Studies

Research "Reagent"	Function in Validation	Implementation Example
Expert Panel	To provide informed judgments on item relevance and representativeness [2] [26].	Select 5-10 experts with a minimum of 5 years of experience in the field relevant to the construct [2].
Structured Rating Scale	To standardize expert evaluations and allow for quantitative analysis [2].	Use a 4-point Likert scale (1=Not relevant to 4=Highly relevant) to force a decision and avoid neutral midpoints [2].
Gold Standard Measure	To serve as an external criterion for validating a new instrument [42] [40].	Use an established, widely accepted tool (e.g., SCID-5 for diagnosing depression) to test the new instrument against [42].
Statistical Correlation Analysis	To quantify the strength of the relationship between measures for criterion and construct validity [42].	Use Pearson's correlation coefficient for continuous variables and Phi coefficient for dichotomous variables [42].

Linking Content Validity to Construct and Criterion Validity

Content validity is not an isolated property but is intrinsically linked to the instrument's subsequent performance in construct and criterion validation. A tool with poor content validity cannot adequately represent the theoretical construct it is intended to measure, thereby undermining construct validity [38]. Similarly, if an instrument does not comprehensively cover the content domain, its ability to predict or correlate with an external criterion is compromised [42].

From Content to Construct Validity

The link between content and construct validity is established through the instrument's performance in relation to theoretical expectations. Once content validity is established via CVI, researchers can investigate construct validity through:

Factor Analysis: A multivariate statistical method, such as Exploratory Factor Analysis (EFA) or Confirmatory Factor Analysis (CFA), which tests whether the items cluster into factors consistent with the underlying theory [42]. This directly assesses if the structure of the instrument reflects the construct.
Multitrait-Multimethod Matrix (MTMM): A technique that evaluates convergent and discriminant validity simultaneously. It assesses whether measures of the same construct are highly correlated (convergent validity) and whether measures of different constructs are not (discriminant validity) [42] [39].

From Content to Criterion Validity

Content validity ensures that the instrument encompasses all facets of the construct that are relevant to the criterion. The linkage is tested through:

Concurrent Validity: The new instrument and the criterion measure (the "gold standard") are administered at the same time, and the correlation between their scores is calculated. A high correlation indicates good concurrent validity [42] [40].
Predictive Validity: The instrument's scores are used to predict a future outcome or behavior that is measured later. The correlation between the initial score and the future outcome indicates the tool's predictive power [42] [40].

Table 3: Statistical Methods for Establishing Criterion and Construct Validity

Validity Type	Subtype	Purpose	Statistical Method
Criterion Validity	Concurrent	To determine the relationship with a criterion administered simultaneously.	Pearson’s correlation; Sensitivity/Specificity; Phi coefficient; ROC/AUC [42]
	Predictive	To examine whether scores predict future outcomes.	Pearson’s correlation; Sensitivity/Specificity; Phi coefficient; ROC/AUC [42]
Construct Validity	Convergent	To check correlation with measures of the same/similar constructs.	Pearson’s correlation; MTMM Analysis [42] [39]
	Discriminant (Divergent)	To check lack of correlation with measures of unrelated constructs.	Pearson’s correlation; MTMM Analysis [42] [39]
	Factorial Validity	To determine the underlying factor structure of the items.	Exploratory or Confirmatory Factor Analysis [42]

Integrated Validation Protocol for Survey Development

The following protocol provides a comprehensive framework for linking content validity with other validity types in survey development research.

Objective: To develop and validate a new survey instrument by sequentially establishing content, construct, and criterion validity.

Procedure: Phase 1: Content Validity Establishment

Item Generation: Create a pool of items based on a comprehensive literature review and theoretical framework.
Expert Panel Review: Engage a panel of 5-10 content experts.
CVI Calculation: Follow the CVI Experimental Protocol outlined in Section 3.1 to calculate I-CVI and S-CVI.
Item Revision: Revise or eliminate items with low I-CVI scores based on expert feedback.

Phase 2: Construct Validation

Pilot Testing: Administer the revised survey to a pilot sample.
Factor Analysis: Perform Exploratory Factor Analysis (EFA) to identify the underlying factor structure. Use Principal Component Analysis and assess sampling adequacy with the Kaiser-Meyer-Olkin (KMO) measure [42] [27].
Reliability Assessment: Calculate internal consistency reliability using Cronbach's alpha (α), with a value above 0.7 generally deemed acceptable [40].

Phase 3: Criterion Validity Assessment

Gold Standard Comparison: Administer the new survey and the chosen "gold standard" measure to the same group of participants concurrently.
Statistical Correlation: Calculate the correlation (e.g., Pearson's r) between the scores of the new instrument and the gold standard. A strong, significant correlation supports criterion validity [42].

A robust validation process for any research instrument requires an integrated approach that explicitly links content validity with construct and criterion validity. The calculation of the Content Validity Index (CVI) is not an endpoint but a critical first step that provides the necessary foundation for all subsequent validity evidence. By following the structured protocols and methodologies outlined in this article—from expert-driven CVI calculation to statistical validation via factor analysis and criterion correlation—researchers in drug development and other scientific fields can ensure their measurement tools are both conceptually sound and empirically rigorous, thereby generating reliable and actionable data.

Validity is a fundamental concept in research, referring to the degree to which evidence and theory support the interpretation of test scores for their intended use [43]. It is not an inherent property of an instrument itself, but rather of the interpretation of the scores derived from that instrument for a specific purpose and population [43] [44]. In contemporary validation frameworks, validity is understood as a unitary concept supported by multiple forms of evidence, with content validity representing one crucial source of such evidence [44].

Content validity specifically assesses how well the items in an instrument represent the entire domain of the construct being measured [20] [41]. This methodological note provides a comparative analysis of content validity assessment methods, particularly focusing on the Content Validity Index (CVI), against other validity assessment approaches, framed within the context of survey development research for pharmaceutical and healthcare applications.

Theoretical Foundations of Validity Types

Content Validity

Content validity evaluates the degree to which elements of an assessment instrument are relevant to and representative of the target construct for a particular assessment purpose [20] [41]. It is primarily concerned with the adequacy of sampling of the content domain and is often considered a prerequisite for other forms of validity [3]. Without adequate content validity, it is impossible to establish the reliability or other validity forms for an instrument [3]. The quantification of content validity typically involves expert judgment and can be measured through indices such as the Content Validity Ratio (CVR) and Content Validity Index (CVI) [3] [20].

Other Major Validity Types

Construct Validity: Evaluates whether an instrument actually measures the theoretical construct it purports to measure [41]. It includes convergent validity (correlation with measures of similar constructs) and discriminant validity (lack of correlation with measures of dissimilar constructs) [20] [41].
Criterion Validity: Assesses how well test results correlate with other criteria known to measure the same construct, either concurrently (compared to a existing standard simultaneously) or predictively (ability to predict future outcomes) [41].
Face Validity: A superficial assessment of whether an instrument appears to measure what it claims to, based on informal evaluation [41]. While not technically a form of validity evidence, it contributes to the perceived credibility of an instrument.

Table 1: Comparison of Major Validity Types

Validity Type	Primary Focus	Assessment Methods	Stage of Development
Content Validity	Domain representation and item relevance	Expert panels, CVI, CVR	Early stage
Construct Validity	Theoretical construct measurement	Factor analysis, MTMM, correlation studies	Middle/late stage
Criterion Validity	Correlation with "gold standard"	Correlation coefficients, ROC analysis	Late stage
Face Validity	Surface appearance of appropriateness	Informal review, stakeholder feedback	Early stage

Quantitative Assessment of Content Validity

Content Validity Ratio (CVR)

The Content Validity Ratio quantifies the extent to which experts rate an item as "essential" to the construct being measured [3] [20]. The formula for CVR is:

CVR = (nₑ - N/2) / (N/2)

Where:

nₑ = number of panelists indicating "essential"
N = total number of panelists [3] [20]

CVR values range from -1 to +1, with higher positive values indicating greater agreement among experts about an item's essentiality. The statistical significance of CVR values depends on the number of experts, with critical values established for different panel sizes [20].

Table 2: Critical Values for Content Validity Ratio (CVR)

Number of Panelists	Minimum CVR Value
5	0.99
6	0.99
7	0.99
8	0.75
9	0.78
10	0.62
11	0.59
12	0.56
20	0.42
30	0.33
40	0.29

Content Validity Index (CVI)

The Content Validity Index represents the average CVR scores of all items in an instrument [20]. There are two primary approaches:

Item-Level CVI (I-CVI): The proportion of experts giving a relevance rating of 3 or 4 on a 4-point scale for each individual item [25]. Items with I-CVI values above 0.79 are considered relevant, between 0.70-0.79 need revision, and below 0.70 should be eliminated [25].
Scale-Level CVI (S-CVI): The proportion of total items judged content valid across all experts. This can be calculated as universal agreement (S-CVI/UA) or as an average of I-CVIs (S-CVI/Ave) [3] [25]. For a newly developed instrument, an S-CVI/Ave of 0.90 or higher is considered excellent content validity [3].

Comparative Analysis: CVI Versus Other Methods

Methodological Distinctions

The CVI method differs significantly from other validity assessment approaches in its reliance on a priori expert judgment rather than empirical testing with participant data [3] [20]. While construct and criterion validity are assessed through statistical analysis of collected data, content validity is established before full-scale instrument deployment [43] [41].

Content validity assessment through CVI is particularly crucial for measuring complex constructs that cannot be directly observed, such as patient-centered communication, depression, or quality of life [3] [20] [41]. In these cases, the construct must be operationalized through careful item generation and expert validation to ensure comprehensive domain coverage.

Application in Healthcare Research

In pharmaceutical and clinical research, CVI plays a critical role in patient-reported outcome (PRO) measure development. For instance, in the development of the Personalized Exercise Questionnaire (PEQ) for people with osteoporosis, researchers demonstrated high content validity with I-CVI ranges from 0.50 to 1.00 and S-CVI/Ave of 0.91 [25]. Similarly, in cerebral visual impairment (CVI) research, the CVI Range-CR assessment demonstrated high internal consistency (Cronbach's α=0.96) alongside its content validation [45].

Table 3: Comparison of Validity Assessment Methods in Healthcare Research

Method	Data Source	Statistical Approach	Research Phase	Example Application
CVI	Expert panel	CVR calculation, I-CVI/S-CVI	Instrument development	PEQ for osteoporosis [25]
Construct Validity	Participant responses	Factor analysis, correlation matrices	Instrument validation	CVI Range-CR [45]
Criterion Validity	Participant responses & gold standard	Correlation coefficients, ROC curves	Instrument validation	Comparison with visual behavior scale [45]
Internal Consistency	Participant responses	Cronbach's alpha, item-total correlation	Reliability testing	CVI Range-CR (α=0.96) [45]

Experimental Protocols for CVI Assessment

Protocol 1: Expert Panel Recruitment and Management

Purpose: To establish a qualified expert panel for content validity assessment.

Procedures:

Identify Expert Criteria: Define qualifications for subject matter experts (SMEs), including clinical, research, and methodological expertise relevant to the construct [3] [46].
Recruit Panelists: Aim for 5-10 experts, as this range provides sufficient control over chance agreement without unnecessary redundancy [3].
Develop Rating Materials: Create clear instructions and a 3-4 point rating scale (e.g., "not necessary," "useful but not essential," "essential") for evaluating item relevance [3] [20].
Conduct Orientation: Provide training to ensure consistent understanding of the construct, domains, and rating criteria.
Administer Ratings: Collect independent ratings from all experts without group consultation to prevent dominance effects.

Validation Parameters: Expert qualifications, representation of relevant disciplines, independence of ratings.

Protocol 2: Quantitative CVI Calculation

Purpose: To compute content validity indices from expert ratings.

Procedures:

Compile Ratings: Collect completed rating forms from all experts.
Calculate Item-Level CVR: For each item, compute CVR using the formula: (nₑ - N/2)/(N/2), where nₑ is the number of experts rating the item "essential" and N is the total number of experts [3] [20].
Determine Item Retention: Compare each item's CVR to the critical value for the panel size. Retain items meeting or exceeding the critical value [20].
Calculate I-CVI: For each item, compute the proportion of experts giving a relevance rating of 3 or 4 on a 4-point scale [25].
Compute S-CVI: Calculate both S-CVI/UA (proportion of items rated content valid by all experts) and S-CVI/Ave (average of I-CVIs across all items) [3] [25].

Validation Parameters: CVR critical values, I-CVI ≥0.79, S-CVI/Ave ≥0.90.

Protocol 3: Integration with Qualitative Content Validation

Purpose: To enhance quantitative CVI assessment with qualitative expert feedback.

Procedures:

Cognitive Interviewing: Conduct structured interviews to explore how experts interpret items and identify potential problems with wording, clarity, or comprehensiveness [25].
Open-Ended Feedback: Solicit suggestions for item improvement, addition, or deletion beyond the quantitative ratings.
Domain Representation Review: Ask experts to evaluate whether the item set adequately covers all relevant domains of the construct.
Iterative Refinement: Revise items based on qualitative feedback and re-administer ratings if substantial changes are made.
Saturation Assessment: Continue cycles of revision and re-evaluation until no new substantive changes are suggested [25].

Validation Parameters: Clarity of items, comprehensiveness of domain coverage, saturation of feedback.

Visual Representation of Validity Assessment Relationships

Figure 1: Relationship between different validity assessment methods in instrument development workflow, showing how Content Validity serves as a foundational step.

The Scientist's Toolkit: Research Reagent Solutions

Table 4: Essential Methodological Components for Validity Assessment

Component	Function	Implementation Considerations
Expert Panel	Provides judgment on item relevance and representativeness	Include 5-10 experts with diverse perspectives; ensure content and methodological expertise [3]
Rating Scale	Standardizes relevance assessments	Use 3-4 point scale (e.g., not relevant, somewhat relevant, quite relevant, highly relevant) with clear anchors [20]
CVR/CVI Formulas	Quantifies expert agreement on item essentiality	Calculate CVR for each item; compute I-CVI and S-CVI for overall instrument [3] [20]
Critical Value Table	Determines statistical significance of CVR	Reference established critical values based on number of panelists [20]
Cognitive Interview Guide	Elicits qualitative feedback on item clarity	Develop protocol to assess comprehension, retrieval, judgment, and response processes [25]
Factor Analysis	Assesses construct validity through dimensional structure	Use EFA for initial validation, CFA for confirmatory studies; evaluate factor loadings and model fit [46]
Reliability Statistics	Measures internal consistency of scale	Calculate Cronbach's alpha; aim for α≥0.70 for group comparisons, ≥0.90 for individual assessment [43]

The Content Validity Index method provides a systematic, quantitative approach to establishing the content validity of research instruments, particularly during early development stages. While CVI focuses on expert assessment of item relevance and representativeness, it complements other validity approaches that evaluate empirical relationships with external criteria (criterion validity) or internal factor structure (construct validity). A comprehensive validation strategy should incorporate multiple sources of validity evidence, with CVI serving as the foundational step to ensure adequate domain coverage before progressing to more complex statistical validation methods. For researchers in pharmaceutical development and healthcare assessment, rigorous application of CVI protocols provides essential evidence that an instrument adequately captures the target construct before investing resources in large-scale validation studies.

Within the rigorous framework of Content Validity Index (CVI) survey development, establishing face validity represents a critical first step in ensuring that patient-reported outcome (PRO) instruments are perceived as relevant, appropriate, and comprehensible by their intended respondents. Contrary to historical practices where content validity was primarily determined by researcher and clinician judgment, modern instrument development emphasizes the indispensable role of patient perspectives in determining whether items appear to measure what they intend to measure from the viewpoint of the end-user [47]. This shift recognizes that even psychometrically sound instruments fail in practice if target populations find them irrelevant, confusing, or unacceptable.

Face validity is formally defined as whether the items of each domain are sensible, appropriate, and relevant to the people who use the measure on a day-to-day basis [47]. Within the validation sequencing for PRO development, face validation typically occurs after item generation and before psychometric testing, serving as a crucial filter to identify problematic items that could compromise data quality, respondent engagement, and ultimately, the validity of the collected data [47] [48]. This application note provides detailed protocols for systematically incorporating patient and end-user perspectives to establish robust face validity within CVI survey development research.

Theoretical Foundation and Key Concepts

Distinguishing Face Validity from Content Validity

While often discussed together, face validity and content validity represent distinct concepts in instrument development:

Face Validity: Concerns the appearance of appropriateness and relevance from the end-user's perspective. It assesses whether items look relevant, are easily understood, and can be answered with minimal burden, particularly when respondents are distressed [47].
Content Validity: Encompasses the methodological assessment of whether items adequately cover all relevant aspects of the construct being measured, typically evaluated by subject matter experts [47] [49].

The following table summarizes the key distinctions:

Table 1: Distinguishing Face Validity from Content Validity

Aspect	Face Validity	Content Validity
Primary Focus	Surface appearance and acceptability	Comprehensive domain coverage
Key Evaluators	Patients, end-users, target population	Content experts, clinicians, researchers
Evaluation Criteria	Relevance, comprehensibility, answerability	Completeness, representativeness, technical accuracy
Methodology	Cognitive interviews, focus groups, usability testing	Expert panels, CVI calculations, Delphi techniques
Outcome	Instrument acceptability and feasibility	Content relevance and comprehensiveness

The Argument-Based Approach to Validation

Modern validity theory emphasizes that validity refers not to the properties of an instrument itself, but to the appropriateness of inferences drawn from its scores [49]. Kane's argument-based approach to validation provides a framework where face validity contributes essential evidence to the "scoring inference" - connecting observed responses to the construct being measured by ensuring items are interpreted as intended by respondents [49]. Within this framework, patient input provides critical evidence that the instrument elicits responses that accurately reflect their experiences rather than being influenced by confusing wording, irrelevant content, or inappropriate response formats.

Experimental Protocols for Establishing Face Validity

Purpose: To identify problems with item comprehension, retrieval of information, judgment formation, and response selection through structured one-on-one interviews.

Materials and Reagents:

Draft instrument items
Audio recording equipment
Interview guide with standardized probes
Demographic data collection form
Consent documents

Participant Recruitment:

Recruit 10-15 participants representing the target population [50]
Employ purposive sampling to ensure diversity in disease severity, demographics, and health literacy levels [47]
Include participants from different clinical settings (e.g., inpatient, outpatient, community) when applicable

Procedure:

Obtain informed consent and demographic information
Present items one by one using a "think-aloud" protocol where participants verbalize their thought process while answering
Employ targeted probes:
- "What does this question mean to you in your own words?"
- "How did you arrive at your answer?"
- "What were you thinking about when you chose that response?"
- "Is there any part of this question that is confusing or difficult to answer?" [50] [8]
Audio record sessions (with permission) for subsequent analysis
Document specific issues with wording, comprehension, and response selection

Analysis:

Transcribe relevant portions of interviews
Code responses for common themes and specific problems
Categorize issues by type (comprehension, recall, judgment, response)
Modify items based on recurrent problems and retest when necessary

Protocol 2: Focus Groups for Thematic Evaluation

Purpose: To identify broader themes of acceptability and relevance through group discussion and interaction.

Materials and Reagents:

Focus group guide with open-ended questions
Moderator and note-taker
Audio recording equipment
Consent documents
Facility with appropriate seating arrangement

Procedure:

Conduct 2-3 focus groups with 5-8 participants each [47]
Begin with general questions about the construct being measured
Present draft items and solicit feedback on:
- Overall relevance and importance
- Potential omissions
- Wording and phrasing concerns
- Emotional impact of items
- Appropriateness of response options
Encourage discussion and contrasting viewpoints
Explore specific concerns about items causing potential distress

Analysis:

Thematic analysis of transcribed discussions
Identification of items perceived as irrelevant, ambiguous, or judgmental
Documentation of suggestions for improvement
Note patterns of agreement and disagreement across groups

Protocol 3: Cross-Cultural Face Validation

Purpose: To ensure face validity across different linguistic and cultural contexts when adapting instruments.

Materials and Reagents:

Forward and backward translations of the instrument
Expert panel of bilingual healthcare professionals
Cognitive interview participants from target cultural groups

Procedure:

Assemble a panel of 6-8 experts proficient in both languages and familiar with both cultures [8]
Conduct independent ratings of cultural relevance and translation equivalence using a 4-point scale (1=not relevant, 4=highly relevant)
Calculate Item-CVI (I-CVI) for cultural relevance as the proportion of experts rating items as 3 or 4
Identify problematic items with I-CVI <0.78 for further refinement [8]
Conduct cognitive interviews with native speakers (n=4-6) to refine problematic items
Iterate until acceptable cross-cultural validity is achieved

Analysis:

Quantitative analysis of I-CVI and Scale-CVI scores
Qualitative analysis of cognitive interviews
Documentation of cultural adaptations made

Data Analysis and Interpretation

Quantitative Metrics for Face Validity

While face validity is primarily established through qualitative methods, several quantitative metrics can supplement these findings:

Table 2: Quantitative Metrics for Face Validity Assessment

Metric	Calculation	Interpretation	Threshold
Item Understanding Ratio	Number of correct interpretations / Total number of participants	Proportion of respondents who correctly understand item intent	≥90%
Missing Data Rate	Number of missing responses / Total possible responses	Indicator of items that are difficult or undesirable to answer	<5% per item
End-User CVI	Proportion of end-users rating item as "quite" or "highly" relevant	Patient perspective on item relevance	≥0.80
Response Distribution	Pattern of responses across available options	Identifies floor/celling effects or limited variability	No extreme clustering

Qualitative Analysis Framework

Thematic analysis of qualitative data from interviews and focus groups should identify:

Comprehension issues: Words, phrases, or concepts that are misunderstood
Retrieval problems: Questions that require recall of information that is unavailable or difficult to access
Judgment formation difficulties: Challenges in applying response options to personal experiences
Response selection problems: Mismatch between experienced phenomena and available answers
Emotional reactions: Items that cause distress, annoyance, or disengagement
Contextual inadequacies: Questions that fail to account for important situational factors

Case Studies and Applications

Case Study: Recovering Quality of Life (ReQoL) Measure

The development of the ReQoL measure for mental health populations provides an exemplary model of systematic face validation [47]. The research team engaged 59 adult service users and 17 young adults through individual interviews and focus groups to evaluate candidate items. Analysis revealed five key themes essential for face validity in this population:

Relevance and meaningfulness: Items must connect to participants' lived experiences
Unambiguous wording: Clear, straightforward language without jargon
Ease of answering when distressed: Recognition that symptoms may impair cognitive function
Avoidance of further upset: Sensitivity to vulnerable mental states
Non-judgmental framing: Language that doesn't imply blame or deficiency

This process led to significant item modification and demonstrated that service user endorsement of items was directly associated with their willingness and ability to respond accurately and honestly [47].

Case Study: EMPOWER-UP Questionnaire

The EMPOWER-UP questionnaire development involved cognitive interviews with 29 adults diagnosed with diabetes, cancer, or schizophrenia to evaluate face validity [50]. The process identified specific problems with item wording, response options, and conceptual clarity across different health conditions. Through iterative refinement, the team strengthened the generic potential of the measure while maintaining face validity across diverse patient populations. The final 36-item questionnaire demonstrated good face and content validity, supporting its use for evaluating empowerment in user-provider interactions across different health contexts [50].

The Scientist's Toolkit: Essential Research Reagents

Table 3: Essential Materials for Face Validity Research

Research Reagent	Function	Application Notes
Semi-Structured Interview Guide	Ensures consistent data collection while allowing flexibility to explore emergent themes	Should include standardized probes and follow-up questions; pre-test with 1-2 participants
Digital Recorder	Captures verbal data for accurate analysis	Ensure sufficient battery life; test audio quality; always obtain participant consent for recording
Demographic Questionnaire	Characterizes participant sample and ensures diversity	Include variables relevant to the construct (e.g., disease severity, treatment history, health literacy)
Item Rating Form	Quantifies patient perceptions of relevance	Use 4-point Likert scale (1=not relevant to 4=highly relevant); include space for qualitative comments
Data Management System	Organizes and stores qualitative and quantitative data	Maintain separate files for raw data, coded data, and analysis; ensure secure storage for confidentiality

Workflow Visualization

Establishing face validity through systematic engagement with patients and end-users is not merely an optional preliminary step in CVI survey development, but a fundamental component that strengthens the entire validation argument. By implementing the protocols outlined in this application note, researchers can ensure that PRO instruments are not only psychometrically sound but also practically meaningful to the populations they are designed to serve. The integration of patient perspectives ultimately enhances data quality, respondent engagement, and the validity of inferences drawn from PRO data in both research and clinical decision-making contexts.

Content validity is a fundamental psychometric property ensuring that an instrument's items adequately represent the entire domain of the construct being measured. In medical instrument development, establishing robust content validity is a critical first step before proceeding to psychometric testing and clinical application. The Content Validity Index (CVI) has emerged as the most widely used quantitative method for this purpose, providing systematic, transparent evaluation of item relevance by expert panels [24].

The development of medical instruments—including patient-reported outcome measures, clinical assessment scales, and diagnostic questionnaires—requires rigorous validation to ensure measurement accuracy, reliability, and clinical utility. Within a broader thesis on CVI survey development research, this article explores the practical application of content validation methodologies through structured protocols and case studies, providing researchers with implementable frameworks for instrument development.

Theoretical Framework: Content Validity Index (CVI) Methodology

Core Concepts and Calculation Methods

The Content Validity Index provides quantitative assessment through two primary levels: item-level evaluation (I-CVI) and scale-level evaluation (S-CVI) [24]. Researchers must understand both components to properly validate medical instruments.

Table 1: Content Validity Index (CVI) Types and Calculation Methods

CVI Type	Definition	Calculation Method	Acceptance Threshold
I-CVI (Item-CVI)	Content validity index for individual items	Proportion of experts giving a rating of 3 or 4 on a 4-point relevance scale	≥ 0.78 [24]
S-CVI/Ave (Scale-Level CVI/Average)	Average of all I-CVIs	Mean of I-CVI values across all items	≥ 0.90 [24]
S-CVI/UA (Scale-Level CVI/Universal Agreement)	Proportion of items rated relevant by all experts	Number of items achieving relevance consensus divided by total items	≥ 0.80 [24]

For instruments with excellent content validity, researchers should aim for a composition of I-CVIs of 0.78 or higher and S-CVI/UA and S-CVI/Ave of 0.8 and 0.9 or higher, respectively [24]. A modified kappa statistic (K*) can be computed to adjust I-CVI for chance agreement, providing a more robust evaluation of item-level validity [24].

Expert Panel Selection and Qualifications

The characteristics and qualifications of the expert panel significantly impact CVI results. Panel composition should include:

Content experts with domain-specific knowledge
Methodological experts in instrument development
Clinical experts with practical experience in the target setting
Target population representatives when developing patient-reported outcomes

The evaluation process and main results of content validity assessment should be comprehensively reported in all scale-related manuscripts to ensure transparency and reproducibility [24].

Application Notes: CVI Calculation Protocol for Medical Instruments

Practical Implementation Framework

Excel-Based Calculation Methodology

For researchers with minimal statistical expertise, Microsoft Excel provides an accessible platform for CVI calculation. Key Excel functions automate the validation process:

COUNTIF function to tally expert ratings of 3 or 4 for each item
AVERAGE function to compute S-CVI/Ave across all items
Conditional formatting to visually identify items below acceptance thresholds

A step-by-step tutorial demonstrates computing I-CVI by counting relevant ratings (values 3-4 on a 4-point Likert scale) divided by the total number of experts [15]. For example, if 8 out of 10 experts rate an item as relevant, the I-CVI would be 0.8. The S-CVI/Ave is then calculated as the average of all I-CVI values [15]. This approach enables researchers to systematically validate instruments, ensuring rigor and reliability in questionnaire development.

Table 2: Excel Formula Application for CVI Calculation

Calculation Step	Excel Formula Example	Output Interpretation
I-CVI Calculation	`=COUNTIF(B2:K2,">=3")/COUNT(B2:K2)`	Returns decimal value (0-1) representing item relevance
S-CVI/Ave Calculation	`=AVERAGE(L2:L20)`	Mean of all I-CVI values in column L
Threshold Flagging	`=IF(M2<0.78,"REVIEW","ACCEPT")`	Automatically identifies suboptimal items

In practical applications, while an overall S-CVI/Average might exceed the threshold of 0.8, individual I-CVI values often reveal specific areas for refinement, highlighting the importance of conducting both item-level and scale-level analysis [15].

Experimental Protocol: Establishing Content Validity for Medical Instruments

Comprehensive Validation Procedure

Protocol Title: Content Validation of Medical Instruments Using CVI Methodology

Objective: To establish content validity through systematic expert evaluation and quantitative assessment using Content Validity Index metrics.

Materials and Reagents:

Expert Panel Database: Repository of qualified content experts
4-Point Likert Scale: Relevance assessment instrument (1=not relevant, 2=somewhat relevant, 3=quite relevant, 4=highly relevant)
Data Collection Platform: Electronic survey system (e.g., Qualtrics, REDCap)
Statistical Software: Excel, SPSS, or R with appropriate validation packages

Procedure:

Item Pool Generation
- Develop comprehensive item pool through literature review and target population input
- Formulate items with clear, unambiguous language appropriate for intended respondents
- Organize items into logical domains with clear construct definitions
Expert Panel Recruitment
- Identify 5-10 experts representing relevant disciplines and perspectives
- Provide comprehensive information about construct definition and target population
- Obtain informed consent for participation in the validation process
Expert Evaluation
- Distribute evaluation package with items and 4-point relevance scale
- Include instructions for rating and soliciting suggestions for item modification
- Allow 2-3 weeks for completion with one reminder at intermediate point
Data Analysis
- Calculate I-CVI for each item as proportion of experts giving rating of 3 or 4
- Compute S-CVI/Ave as mean of all I-CVI values
- Calculate S-CVI/UA as proportion of items achieving relevance consensus
- Compute modified kappa statistic for chance-corrected agreement
Item Revision and Re-evaluation
- Identify items with I-CVI below 0.78 for revision or elimination
- Incorporate expert qualitative feedback into item modifications
- Re-evaluate revised items with original or new expert panel if substantial changes made

Quality Control:

Maintain documentation of all expert comments and rationales for item changes
Ensure blinding of experts to each other's ratings during initial evaluation
Establish inter-expert communication protocol for resolving discrepancies

Research Reagent Solutions for CVI Studies

Table 3: Essential Materials for Content Validation Studies

Reagent/Resource	Function	Implementation Considerations
Expert Panel Database	Provides qualified evaluators for content relevance assessment	Should include diverse expertise: clinicians, methodologists, patient representatives
4-Point Relevance Scale	Standardized metric for expert rating of item relevance	Prevents neutral responses; forces directional assessment
Digital Survey Platform	Facilitates data collection and management	Should allow anonymous rating, qualitative feedback, and automated reminders
CVI Calculation Template	Spreadsheet with pre-programmed formulas for validity indices	Reduces computational errors; standardizes reporting metrics
Modified Kappa Calculator	Statistical adjustment for chance agreement in I-CVI	Provides more robust validity estimates than raw proportion agreement

Case Study: Validation of a Diagnostic Assessment Tool

Practical Application in Clinical Instrument Development

Background: Development of a novel clinician-reported outcome measure for assessing disease severity in chronic neurological conditions.

Validation Approach:

Initial item pool: 32 items across 4 domains (motor function, cognitive assessment, daily living activities, quality of life)
Expert panel: 8 members (3 neurologists, 2 rehabilitation specialists, 2 methodologists, 1 physical therapist)
Evaluation method: Electronic survey with 4-point relevance scale plus open-ended feedback

Results:

Initial I-CVI range: 0.50-1.00 (5 items below 0.78 threshold)
Initial S-CVI/Ave: 0.85 (below target of 0.90)
Initial S-CVI/UA: 0.72 (below target of 0.80)

Revision Process:

Items with I-CVI < 0.78 underwent substantive revision based on expert feedback
2 items eliminated due to conceptual overlap and persistent low relevance ratings
Revised instrument: 30 items with improved clarity and domain alignment

Post-Revision Outcomes:

Final I-CVI range: 0.88-1.00 (all items above 0.78 threshold)
Final S-CVI/Ave: 0.95 (exceeds target of 0.90)
Final S-CVI/UA: 0.87 (exceeds target of 0.80)

This case demonstrates the iterative nature of content validation, where quantitative CVI metrics guide systematic instrument refinement, substantially improving content validity before proceeding to psychometric testing.

Integration with Clinical Trial Protocols and Regulatory Requirements

Alignment with Clinical Research Standards

The validation of medical instruments must align with broader clinical research frameworks, including clinical trial protocols and regulatory requirements. The updated SPIRIT 2025 statement provides guidance on items to address in trial protocols, reflecting methodological advances and emphasizing transparent reporting [51]. For medical device trials specifically, protocols must clearly describe the device and its intended use while outlining safety monitoring procedures [52].

Medical device clinical trials differ significantly from pharmaceutical trials, with greater emphasis on real-world function, usability, and technical performance [53]. Unlike fixed chemical compounds, devices often undergo iterative testing cycles, requiring flexible validation approaches that accommodate design modifications while maintaining scientific rigor [53].

Regulatory Landscape for Medical Devices

In the European Union, the Medical Device Regulation (EU MDR 2017/745) has significantly raised evidence requirements, demanding robust clinical data even for legacy devices [53]. Similarly, the U.S. Food and Drug Administration (FDA) requires rigorous validation through either 510(k) clearance, De Novo classification, or Premarket Approval (PMA) pathways, depending on device risk classification [53].

Content validation represents the foundational evidence layer within this comprehensive regulatory framework. Establishing robust content validity through systematic CVI methodology provides the evidentiary basis for subsequent validation stages and regulatory submissions.

The Content Validity Index provides a robust, quantitative methodology for establishing the foundational validity of medical instruments. Through systematic expert evaluation and iterative refinement, researchers can ensure that instruments adequately represent target constructs before progressing to resource-intensive psychometric validation and clinical implementation.

The integration of CVI methodology within broader clinical research frameworks—including clinical trial protocols and regulatory requirements—ensures that medical instrument development meets evolving standards for scientific rigor and evidence generation. As medical devices and digital health technologies continue to advance, with innovations like wearable devices, AI-powered diagnostics, and software as a medical device (SaMD) on the rise [53], established validation methodologies like CVI remain essential for ensuring measurement quality and patient safety.

This structured approach to content validation, incorporating both quantitative metrics and qualitative expert feedback, provides researchers with a comprehensive framework for developing medically relevant, psychometrically sound assessment tools that generate reliable evidence for clinical research and practice.

Conclusion

The Content Validity Index is an indispensable, scientifically rigorous tool for ensuring that survey instruments accurately measure their intended constructs in biomedical and clinical research. By systematically applying CVI methodology—from expert panel selection to quantitative analysis—researchers can significantly enhance the quality and trustworthiness of their data. Future directions should focus on the integration of CVI with advanced statistical validation techniques, the development of standardized reporting guidelines for content validity, and the adaptation of these methods for digital health tools and patient-reported outcome measures. Mastering CVI application empowers drug development professionals to create more valid, reliable, and impactful research instruments that ultimately contribute to higher-quality scientific evidence and improved healthcare outcomes.