The psychometric rigor (or empirical quality), and ease of use, of gender equality and empowerment measures are assessed by EMERGE staff, the results of which are provided to site users on each measure page to inform their measure selection. However, it is important to note that scores presented on our site serve neither as recommendation nor endorsement of a given measure by the EMERGE team. The utility of a measure should be determined by the user.
Key takeaways on our scoring methodology:
Information on scoring, statistical adequacy guidelines, and citation frequency are summarized in the following sections.
SCORING
Formative research, expert input, reliability, and validity are used to assess the psychometric soundness of measures (see How to Create Scientifically Valid Social and Behavioral Measures on Gender Equality and Empowerment Report on our Resources page for more information on psychometrics).
Psychometrics
Our psychometric scoring methodology is based on the DeVellis (2017), Essaylamba (2019) and COSMIN (2011) approaches. Following these approaches, we assess reliability in terms of internal consistency, test-retest, and inter-rater reliability and validity in terms of content and face validity, criterion (gold standard) validity, and construct validity. However, our methodology differs from these sources in a few important ways: we do not require that measures use Item Response Theory or Classical Test Theory methods, we do not directly evaluate aspects of study design (missingness, sample size, etc.) or assess cross-cultural validation efforts, and we are not restrictive in the type of statistical tests used.
Our decision to apply a more inclusive scoring methodology enables us to evaluate measure quality across a wide array of disciplines and subject matters and allows the focus to be on unique gender equality and empowerment-related measures as rather than translated iterations of the same measure. Finally, in an effort to ensure scores are provided in a consistent, unambiguous manner, our scoring methodology uses stated statistical “adequacy” guidelines. These guidelines are built into our scoring protocol to limit subjectivity in how evaluations are made. This methodology allows us to robustly evaluate the psychometric rigor of numerous measures across multiple disciplines.
Scoring Procedure and Rubric
Trained EMERGE team members review each measure in terms of formative research, reliability testing and validity testing; measures are then reviewed by a second EMERGE scorer for quality assurance. Scores are assigned as “Adequate”, “Limited”, “Not available” and “Not applicable” using a priori statistical adequacy guidelines.
The rubric used to score the psychometric soundness of published measures is displayed below. The total possible score varies between 7 and 10 points, depending on whether inter-rater validity, criterion validity, and internal reliability are applicable to a given measure. Final scores are presented as aggregated groupings: “Low” (≤33.3%) “Med” (Medium) (33.4%-66.6%), “High” (≥66.7%), or “No Data”.
Measures that could not be scored (i.e., had no psychometric information) are classified as “No Data”.
Rubric:
Preliminary Measure Development:
Formative Research and theory to develop items
____ Presence of qualitative research (Adequate = 1pt)
____ Mention of existing literature, theoretical framework (Adequate = 1/2pt)
Expert Input on developed items
____ Field expert input (Adequate = 1/2pt)
____ Cognitive Interviews/pilot testing (Adequate = 1pt)
Formal Assessment of Psychometric Properties:
Reliability
____ Internal reliability (Adequate = 1pt; Limited = 1/2pt)
____ Test-retest reliability (Adequate = 1pt; Limited = 1/2pt)
____ Inter-rater reliability (Adequate = 1pt; Limited = 1/2pt)
Validity
____ Content validity (Adequate = 1/2pt)
____ Face validity (Adequate = 1/2pt)
____ Criterion (gold standard) validity (Adequate = 1pts; Limited = 1/2pt)
____ Construct validity (Adequate = 2pts; Limited = 1pt)
Viewing Measure Scoring Details
To view a measure’s scoring details, click the Psychometric Score, Ease of Use Score, or Citation Frequency button (on the right hand side of a measure page). Clicking on these buttons will display additional scoring details for a particular measure.
The ease of use of a given measure is a critical aspect of how widely it can practically be used. Our ease of use scoring methodology is based on the Lewis (2021) and Glasgow (2013) approaches, with simplification and adaptation for relevance across settings. We assess ease of use in terms of readability, scoring clarity and length. Readability assess how clearly a measure is written based on sentence and word length, as measured by the Flesch Kincaid Grade Level score. Scoring clarity assesses guidance on measure scoring and interpretation of scores. Measure length assesses the number of items in a measure.
to get the latest updates on new measures and guidance for survey researchers