Tag Archives: ChatGPT

Crowdsourcing

More than 20 years after its creation, Wikipedia illustrates the benefits and pitfalls of crowdsourced information. What started as a site widely distrusted by most, it managed to create a model for open collaboration that has gradually earned a healthy measure of trust among the public and even the scholarly community. One reason why many of us have come to appreciate Wikipedia, despite its many flaws, is that  it has worked hard to bolster what my teachers used to call its “critical apparatus”. That is, it has created the means for other scholars to critique the content and structure of the material presented in its many pages.

As a young student of history, I was taught that the citations in my papers constituted the “critical apparatus.” Among scholars, it is through citations and references that one is able to support one’s claims. The critical apparatus created through these tools constitute the main source of authority for any given piece of scholarship. However, the public’s reliance on reference works, such as encyclopedias, is contingent on their reputation, which in itself is built upon its editorial processes and the people they employ. Wikipedia’s model of open collaboration encourages the use of citations and references, but also allows the public not only to participate in the editorial process, but to see what changes are being proposed for individual articles and to discuss said changes with other individuals also involved in editing a page. This opening-up of the editorial process adds, in my opinion, to the critical apparatus of Wikipedia articles. 

The examination of the “Digital Humanities” article in Wikipedia serves as a good illustration of how tools such as the “Revision History” or the “Talk History” expand our ability to evaluate the quality of a page. A cursory read of the Wikipedia article on “Digital Humanities” shows a reasonably well-organized article with numerous references, a bibliography and recommendations for further reading. Given what we have learned about the field of Digital Humanities, it is reasonable to ask how complete and how up-to-date the article is, how recently, and how often has the page been edited and to what degree. When we looked for a term or topic in a traditional encyclopedia we would accept that, many topics, would be out-of-date given the time lag between when an article was written and published and when it was read. The level of authority on any given piece decreases as time passes. However, as my father used to say about  national Constitutions, the authority of any set of statements requires a balance between stability and flexibility. I would argue that the authority of an encyclopedia article is also contingent on how stable and flexible they are. The fact that an article can be updated as new information becomes available, but not change radically from one version to the next, is a factor that adds to its authority.

The “Digital Humanities” article was last edited on October 2023 but, for the past two years the edits have been relatively minor. One can appreciate this by looking at the revision history tool, where it is possible to see when the page was first created and one one can examine every single version of the article. The revision history will show the date when an edit was done, how many bytes were added, how many words were added or deleted, and there will be a brief description of the changes. One also has the possibility of comparing two versions. Although this particular feature was difficult to use because the changes appear out of the context of the larger page. I found it easier to specifically open different versions and then choose what versions I wanted to explore further. 

Other important information that can be ascertained from the revision history is included in the the Statistics information. Here, one can find how many times a particular page has been edited and at what rate. In the case of the “Digital Humanities” article we can see that since 2006, when the page was originally created, it has been edited 1009 times, 459 editors have made contributions. In average, each user has done 2.2 edits and the page has been edited, in average, every 6.4 days. However, most of the most substantive edits took place in the earlier years of the page. During the past 365 days, only 7 edits have taken place and these have been relatively minor. 

As a reader one can take some confidence of the fact that the page seems that have reached some stability and that further changes to it seem to be of a minor nature. As a student in the Digital Humanities one could wonder whether this means that debates about the definition, history, and scope of the field have been resolved. For this, the Talk History tool may prove very useful. Here, one can see what have been the issues that have preoccupied the editors, how the explain an edit, when they seek comments or advise on a particular change. In the case of the “Digital Humanities” article we can see that the “Concerns” section of the talk history, includes a lot of comments about the history and the definition of Digital humanities. There are also questions about balancing the different types of tools that are included in the Methods, and Projects of Digital Humanities.

Traditional encyclopedias presented themselves as sources of authority due to their editorial processes. Some were open about who were the authors of their articles, others kept those anonymous. In an attempt to foster participation, while also maintaining transparency and accountability, Wikipedia offers contributors both options . One can add an edit to a page without disclosing a name or without creating a user profile. In this case, the edit is recorded under an ISP. But many contributors do create a user name and a profile, and this can be used as another means to read and evaluate a particular entry. By looking at the Statistics page one can determine which contributors have made the most contributions and when. From there one can also see their profiles, if they have created one. From the ten most prolific contributors to the “Digital Humanities” article (at least in terms of words); two are just identified by their ISP, and two have profile names but no information under their profile. Two more are students participating in a Wikipedia-related curriculum. Only about five of them have been active within the last five years. Which again speaks to how relatively stable the page has become. Only one of the contributors, its creator in fact, identifies himself as a scholar and professional in the digital humanities. A couple of contributors identify themselves as scholars in computing or the humanities. 

Learning about the identities, and maybe even qualifications, of contributors may be more relevant to students and/or scholars that to the casual visitor. The latter can rely on the general Assessment of a page which can also be found in the Statistics page and explained in the Assessments page. But for scholars, students or other experts, learning about the contributors to a page is very useful to contextualize and understand the criteria they use when explaining and justifying their editorial decisions. The revision history and the talk history enable us to do a historiographical analysis of any given article, the more we know about the authors of these versions the more informed our analysis and evaluation will be. 

As an organization, Wikipedia has made substantial investments in building a critical apparatus that supports its reputation as an authoritative source of knowledge; not because Wikipedia articles are perfect, but because anyone can identify its sources and follow the process by which they have been created. In contrast, ChatGPT has stayed closer to a notion of “the wisdom of crowds” in that its source of authority is contingent upon the vast volumes of information used to teach be chatbot. In the case of Wikipedia, the crowds involved in creating its articles are people, engaged in an ongoing discussion of what to add, delete or change. ChatGPT uses much of this knowledge to produce elegantly explained answers in a way that seems very simple and clear to the reader. However, we do not get any sense of the sources that were used or the criteria used to select information. In this case ChatGPT lacks a “critical apparatus” on which to rest its authority. It purely relies on a crowd of anonymous sources. ChatGPT is able to elegantly synthesize large bodies of knowledge in the most clear and simple way. However, sources like Wikipedia demonstrate that even a work of synthesis needs to make choices and it is important for readers to understand what are the justifications for these choices. The questions of accuracy that we used to have about Wikipedia are multiplied in the case of ChatGPT since we have no mechanism to assess its accuracy, completeness, or judgements. ChatGPT can produce very clear explanations and narratives, but in this case, clarity may obscure the inherent complexity and messiness of knowledge production.