| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Plagiarism Detection Tools for Textual Analysis by Monica Bulger

Page history last edited by PBworks 16 years ago

 

 

 

Research Report: Plagiarism Detection Tools for Textual Analysis

 

By Monica Bulger, Close Reading Re-visited Team

 

Abstract

With the emergence of digital texts and the movement to digitize existing texts, a complementary push to develop digital means of textual analysis is similarly surfacing. Digital textual analysis tools can draw on large databases of texts or apply multiple analysis methods to a single text in minutes. Thus, digital textual analysis potentially provides the opportunity to examine large amounts of texts in scant time (Moretti, 2007). While designed to perform a punitive function of identifying plagiarism, detection tools such as Pairwise, CopyCatch, and Turnitin compare multiple texts against each other, identify similarities between the texts, and generate a visual representation of these text matches, making them ideal candidates for digital text analysis.

 

Description

Within a given text lie many opportunities for interpretation or analysis. Whether reading literary canons or student papers, much can be learned about context, meaning, or use of language within the text. A fundamental element of textual interpretation is close reading, a process by which texts are carefully examined for connections or disconnections in content, form, language, or context. Textual analysis, an element of close reading, is described by Rockwell (2003) as exploring “the question of the relationship between how we represent texts, how we see them, and our theories of textuality” (209). Generally, textual analysis seeks to identify patterns within the text, such as concordance or unity (Rockwell, 2003), meaning (Samuels & McGann, 1999), truth (Brooks, 1947), or rhetorical strategy (Bazerman & Prior, 2004).

 

With the emergence of digital texts and the movement to digitize existing texts, a complementary push to develop digital means of textual analysis is similarly surfacing. These tools can draw on large databases of texts or apply multiple analysis methods to a single text in minutes. Thus, digital textual analysis potentially provides the opportunity to examine large amounts of texts in scant time (Moretti, 2007). Moreover, digital textual analysis tools offer visualization potential that exceeds that of text. For example, these tools can create graphs of word occurrences, or map those occurrences according to concordance, or remove all occurrences of a word to show its frequency in a text.

 

While not developed to perform textual analysis, plagiarism detection tools such as Pairwise, CopyCatch, and Turnitin could potentially serve as effective means of just that. While designed to perform a punitive function of identifying plagiarism, these tools compare multiple texts against each other, identify similarities between the texts, and generate a visual representation of these text matches, making them ideal for analysis of student academic texts.

 

 

Plagiarism detection tools

 

Pairwise is a free, open source plagiarism detection tool developed by Allan Knight, a Computer Science graduate student at UCSB. Designed to deter students from plagiarism, Pairwise compares student papers against a database of current and past papers for a given course. The system’s strength lies in its ability to compare a single paper to multiple papers, as well as to the Internet. Pairwise examines papers for matching verbatim texts of six words or more. Once it identifies a match, Pairwise presents a side-by-side visualization of the papers in which similar phrases are highlighted and numbered (see Figure 1). This tool provides a graphic representation of exactly which parts of a student paper are potentially plagiarized and how closely the student text matches other student texts or original source texts, depending on the matching text identified by Pairwise.

 

Figure 1: Pairwise side-by-side visualization of text match

 

 

For each text run through the system, Pairwise provides a linked list of matching texts along with a similarity index represented by percentage (see Figure 2). Thus, teachers and TAs can identify potential trends in similar texts. Often, the matches between student texts are benign, such as quotes from class lecture or assigned readings. In these cases, Pairwise performs an additional pedagogical function of visualizing the extent to which students incorporate course resources into their papers.

 

Figure 2: Pairwise similarity index

 

Currently, the text visualization displays only two texts at a time, so while the report list identifies all potentially plagiarized phrases in the document, the visual representation of the text only shows similarities between the two texts displayed. This visualization is a powerful means of viewing similarities between two texts, however, it could be enhanced by offering an option to view multiple matches between a selected student paper and texts in the Pairwise database.

 

While not yet available as an online tool, Pairwise can be downloaded and installed to a server, although installation requires advanced technical expertise.

 

Another plagiarism tool, CopyCatch, developed in the UK claims to find text similarities beyond verbatim matches, including “changes in word order, insertions, or deletions” (http://www.copycatchgold.com). Similar to Pairwise, this system reports percentage of similarity matches and highlights instances of matching texts. The text visualization, however, is limited to a small box of 4-5 lines and appears to extract the matching text, rather than preserving the context of the document. This method of visualization therefore does not allow for a view of the entire paper at once and would not be useful in identifying trends in outside source use beyond similarity percentages.

 

While the CopyCatch software certainly sounds interesting, it was not tested for this paper because the company website does not offer a free demo download and the software costs about £250 per year, according to the University of Michigan’s Instructor resource website (Gaither, 2007).

 

Notorious among students and instructors alike, Turnitin is a widely used plagiarism detection tool that compares student papers against its immense database of previously submitted student papers. According to its website, similarity checks include “exhaustive searches of billions of pages from both current and archived instances of the internet, millions of student papers previously submitted to Turnitin, and commercial databases of journal articles and periodicals” (http://www.turnitin.com). Similar to Pairwise and CopyCatch, Turnitin reports on the percentage of similarity matches between two texts and visualizes these matches with side-by-side colored comparisons. In addition to offering plagiarism detection, Turnitin also offers a “Digital Assessment Suite” with tools for online peer review. 

 

While offering a suite of options for online feedback of student papers, Turnitin has received criticism for using student papers for profit without compensating the student authors (Glod, 2007). Also, similar to CopyCatch, Turnitin.com is not a free service: licensing fees vary depending upon whether the system will be used for individual, departmental, or campus wide use.

 

Commentary

Potential for educational research

 

Beyond simply detecting plagiarism, tools like Pairwise, CopyCatch, and Turnitin.com potentially provide an innovative means of studying student composing processes. Typically, these processes are studied using qualitative measures such as think aloud protocols (Flower & Hayes, 1981; Coiro & Dobler, 2007), observation, survey, or interview (Kellogg, 1994). In combination with these measures, plagiarism detection tools could quantify the ways in which students use original source material and other course resources when composing academic texts. For example, in a think aloud study, a researcher usually sits beside the student while she is composing a text and asks the student to describe her process as she engages in it. In addition to the transcripts produced from this type of study, the researcher could add the original source texts used by the student to the Pairwise database and run a comparison of these texts with the student texts to measure the extent to which the student uses source materials when completing the writing task under study. Thus, Pairwise could provide a quantitative measure to complement the qualitative think aloud data.

 

Qualitative measures, such as the think aloud protocol described above, generally measure the why and how of student composing, e.g., why a specific text is used first, how a student organizes her document. Plagiarism detection tools, on the other hand, can provide a measure of what is produced, for example, the overall make-up of a student text in comparison with source texts. By using tools such as Pairwise to compare student texts against source texts, educational researchers can begin to see patterns in effective use of source materials. In Graphs, Maps, Trees, Moretti (2007) describes the benefits of reading text from a distance. Traditional methods of textual analysis generally involve micro-level examination of rhetorical moves; however, digital visualization tools offer the potential to view textual patterns or trends at a macro level. Thus, rhetorical moves can be viewed from the class level, rather than the student level, or the paper level, rather than the sentence level.

 

Imagine if a researcher were to view the side-by-side comparison of student text and source text in Pairwise and zoom out so that the highlights were still visible but the view focused on the overall document (rather than single pages). The percentage of source text used and the patterns in which this text appeared in the student text would then be immediately apparent (see Figure 3).

 

Figure 3: Comparison of multiple papers using Pairwise

 

 

This visual representation potentially enhances textual analysis capabilities because it enables the researcher to compare individual student patterns with larger trends within the cohort, thus fostering a deeper understanding of how students use source text in their academic writing. Ironically, this understanding could lead to more effective instruction in using and attributing sources in academic writing and thus reduce incidents of plagiarism (Townley & Parsell, 2004).

 

Resources for further study

Bazerman, C. & Prior, P. (2004). What writing does and how it does it. Mahwah, New Jersey: Lawrence Earlbaum Associates.

 

Brooks, C. (1947). The well wrought urn: Studies in the structure of poety. Orlando, Florida: Harcourt Brace & Company.

 

Coiro, J. & Dobler, E. (2007). Exploring the online reading comprehension strategies used by sixth-grade skilled readers to search for and locate information on the Internet. Reading Research Quarterly, 42, 214-257.

 

Copycatch. (2007). Retrieved March 12, 2008, from http://www.copycatchgold.com

 

Flower, L. & Hayes, J.R. (1981). A cognitive process theory of writing. College Composition and Communication, 32, 365-387.

 

Gaither, R. (2007). Plagiarism Detection Services. Retrieved March 11, 2008, from University of Michigan, Shapiro Undergraduate Library Web site: http://www.lib.umich.edu/acadintegrity/instructors/violations/detection.htm

 

Glod, M. (2007, March 29). McLean students sue anti-cheating service. Washington Post. p. B05.

 

Kellogg, R.T. (1994). The psychology of writing. New York: Oxford University Press.

 

Moretti, F. (2007). Graphs Maps Trees. London: Verso.

 

Pairwise. (2005). Retrieved February 28, 2008, from University of California, Santa Barbara, Center for Information Technology & Society Web site: http://www.pairwise.cits.ucsb.edu/

 

Samuels, L & McGann, J. (1999). Deformance and interpretation. New Literary History, 30.1, 25-56.

 

Turnitin. (2008). Retrieved March 20, 2008, from http://www.turnitin.com/static/home.html

 

Rockwell, G. (2003). What is text analysis, really? Literary and linguistic computing, 18, 209-219.

 

Townley, C. & Parsell, M. (2004). Technology and academic virtue: Student plagiarism through the looking glass. Ethics and Information Technology, 6, 271-277.

 

Comments (0)

You don't have permission to comment on this page.