dc_train.Rd
Train document classifier.
dc_train(model, lang, data)
model | Full path to Output model file. |
---|---|
lang | Language which is being processed. |
data | a data.frame of classifed documents, see details and examples. |
data
is a data.frame of 2 columns:
class - the dodcument class
document - the document
Note that you need a 5'000 classified document to train a decent model. The examples below are just to demonstrate how to run the code.
# NOT RUN { # get working directory # need to pass full path wd <- getwd() data <- data.frame( class = c("Sport", "Business", "Sport", "Sport", "Business", "Politics", "Politics", "Politics"), doc = c("Football, tennis, golf and, bowling and, score.", "Marketing, Finance, Legal and, Administration.", "Tennis, Ski, Golf and, gym and, match.", "football, climbing and gym.", "Marketing, Business, Money and, Management.", "This document talks politics and Donal Trump.", "Donald Trump is the President of the US, sadly.", "Article about politics and president Trump.") ) # Error not enough data # model <- dc_train(model = paste0(wd, "/model.bin"), data = data, lang = "en") # repeat data 50 times # Obviously do not do that in te real world data <- do.call("rbind", replicate(50, data[sample(nrow(data), 4),], simplify = FALSE)) # train model model <- dc_train(model = paste0(wd, "/model.bin"), data = data, lang = "en") # }