This function checks, for each word in a text, how frequently it occurs in a given language. This is useful for eliminating rare words to make a text more accessible to an audience with limited vocabulary. htmlParse
and xpathSApply
from the XML
package are used to process HTML files, if necessary. textToWords
is a helper function that simply breaks down a character vector to a vector of words.
detectRareWords(textFile = NULL, wordFrequencyFile = "Dutch", output = c("file", "show", "return"), outputFile = NULL, wordCol = "Word", freqCol = "FREQlemma", textToWordsFunction = "textToWords", encoding = "ASCII", xPathSelector = "/text()", silent = FALSE) textToWords(characterVector)
textFile | If NULL, a dialog will be shown that enables users to select a file. If not NULL, this has to be either a filename or a character vector. An HTML file can be provided; this will be parsed using |
---|---|
wordFrequencyFile | The file with word frequencies to use. If 'Dutch' or 'Polish', files from the Center for Reading Research (http://crr.ugent.be/) are downloaded. |
output | How to provide the output, as a character vector. If |
outputFile | The name of the file to store the output in. |
wordCol | The name of the column in the |
freqCol | The name of the column in the |
textToWordsFunction | The function to use to split a character vector, where each element contains one or more words, into a vector where each element is a word. |
encoding | The encoding used to read and write files. |
xPathSelector | If the file provided is an HTML file, |
silent | Whether to suppress detailed feedback about the process. |
characterVector | A character vector, the elements of which are to be broken down into words. |
detectRareWords
return a dataframe (invisibly) if output
contains return
. Otherwise, NULL is returned (invisibly), but the output is printed and/or written to a file depending on the value of output
.
textToWords
returns a vector of words.
# NOT RUN { detectRareWords(paste('Dit is een tekst om de', 'werking van de detectRareWords', 'functie te demonstreren.'), output='show'); # }