A common challenge with processing naturalistic driving data for many different possible driving research interests or applications is that humans may need to categorize great volumes of recorded visual information until automated algorithms might be trained to do so alone.
This study, by means of the online platform CrowdFlower, investigated the potential of crowdsourcing to provide content identification categorizations of driving scene features (e.g., presence of another vehicle, straight road segments, etc.) at greater scale than a single person or a small team of researchers would be capable of. The validity and reliability of CrowdFlower results were examined, both with and without employing a set of randomly embedded controlled questions (Gold Test Questions) intermixed with experimental questions (Work Mode). In total, 200 workers from 46 countries participated in this study, and the collection of data lasted one and a half days.
By employing Gold Test Questions, we found significantly more accurate and consistent responses from external workers at both a smaller and larger scale of video segment categorizations for the identification of common driving scene elements (e.g., position and behavior of other vehicles, road and signage characteristics, etc.). In terms of validity and at the small scale, an average accuracy of 91% on paired items was found with the controlled questions compared to 78% without. A difference in bias was found where without Gold Test Questions external workers returned more false positives than false negatives whereas the opposite was found true of the condition with Gold Test Questions. At the large scale (making use of the controlled questions), a random subset of categorizations returned similar levels of accuracy (95%) and a similar pattern of error bias. In terms of reliability and at the small scale, where segments were rated in triplicate redundancy, the percentage of unanimous agreement was found significantly higher when using controlled questions (90%) than without them (65%). Across the small scale of internally validated answers, more than two-thirds of any correct categorization were unanimously returned and 86% or more of any correct categorization was returned by a majority vote. Where it would be infeasible to validate every response for accuracy, similar voting reliability results were found to exist across the responses of the large scale.
Overall results support compelling evidence for CrowdFlower as being able to yield valid and reliable crowdsourced categorizations of naturalistic driving scene contents in a short period of time and thus a potentially powerful and as-of-yet under-utilized resource in the toolbox of driving research and driving automation development.