Step to validity | Aims | Essential questions |
---|---|---|
0. Define the measurement context and target group | To make the context (e.g. learning objectives, VET system) of the study explicit and to understand the characteristics of the context and the target group | What is the context of the study? What are the characteristics of the context? Who is the target group? Do special requirements have to be considered when testing the sample? |
1. Define the construct | To make the underlying theoretical concepts and their functioning or relation transparent | How is the construct defined? On which theoretical considerations/model is the construct based on? Which dimensions of the construct should be measured? How does the construct operate (with regard to other related constructs)? |
2. Make explicit the intended interpretation, use and decision(s) | To formulate which decisions are taken on the basis of the test results, e.g., entrance or final examination, determination of status quo, selection or qualification procedure | What are the test results used for? What should the test results not be used for? What decisions are made based on the test results? Who receives the test results and in what form? |
3. Define the interpretation-use argument, and prioritize needed validity evidence | To formulate comprehensive assumptions regarding the test construction, which will be tested in Step 5 using appropriate methods Different frameworks can be used for evidence collection. The AERA et al. (2014) proposed test content, response processes, internal structure, relations to other variables and consequences | Which is the most important validity evidence? Test Content What is expected in terms of what the test is supposed to measure? Response Processes What is expected of the participants to think or understand when completing the test? Internal Structure What is expected with regard to the dimensionality of test content (one-dimensional, multi-dimensional)? How reliable is the test? Relations to Other Variables How are the achieved test scores expected to be related to the outcomes of other constructs (e.g. intelligence, prior knowledge, motivation)? Consequences What intended and unintended consequences (impact, decisions, actions) does the test entail? |
4. Identify candidate instruments and/or create/adapt a new instrument | To check already available instruments with regard to the fit of the above formulated intentions and assumptions To adapt or develop a new instrument with regard to the requirements | What kind of assessment tool is the best to measure the construct? Are there already validated instruments available that meet the purpose of the intended measurement in part and are adaptable to the context and target group? What are the requirements of test construction with regard to the previous considerations (e.g. context, target group, construct definition, internal structure)? |
5. Appraise existing evidence and collect new evidence as needed | To appraise existing evidence and to collect new evidence according to the interpretation-use argument (Step 3) | Is there existing evidence for the validity of items/tasks? And if so, is more evidence needed to for example fit the target group? What kind of method is the best to gather the evidence? In what form will the evidence be documented, analysed and reported? |
6. Formulate/synthesize the validity argument in relation to the interpretation-use argument | To compare the proposed assumptions with the evidence found | Was it possible to collect sufficient evidence for all assumptions? Is there a need to review the instrument further, to change or revise some tasks? Is there a need to gather more evidence? If so, what evidence and by what method? Are evidence gaps recognizable? |
7. Keep track of practical issues including cost | To report on resources required for test administration, costs and technical aspects | What resources are needed to implement the instrument? How much does it cost to develop the instrument (e.g. programming, media, personal)? How much does it cost to gather evidence? What kind of problems might arise during the implementation? What is helpful to implement the instrument? How much does it cost to apply the test? What kind of equipment is needed? Do test administrators need special knowledge and skills? |
8. Make a judgment: does the evidence support the intended use? | To make a final judgement on whether the available evidence is in accordance with the intended use of the test | Is there enough evidence available to support the intended test use? How can evidence gaps be tackled/approached? What are next steps? |