Reputation: 2800
Let's start with a reproducible example, which is a data frame called key
composed by 8 columns and 3 rows:
key <- structure(c("Make Professional Maps with QGIS and Inkscape",
"Gain the skills to produce original, professional, and aesthetically pleasing maps using free software",
"English", "Inkscape 101 for Beginners - Design Vector Graphics",
"Learn how to create and design vector graphics for free!", "English",
"Design & Create Vector Graphics With Inkscape 2016", "The Beginners Guide to designing and creating Vector Graphics with Inkscape. No Experience needed!",
"English", "Design a Logo for Free in Inkscape", "Learn from an award winning, published logo design professional!",
"English", "Inkscape - Beginner to Pro", "If you want to have a decent learning curve, you are new to the program or even in design, this course is for you.",
"English", "Creating 2D Textures in Inkscape", "A guide to creating colorful and interesting textures in inkscape.",
"English", "Vector Art in Inkscape - Icon Design | Make Vector Graphics",
"Learn Icon Design by creating Vector Graphics using the .SVG and PNG format with the Free Software Inkscape!",
"English", "Inkscape and Bootstrap 3 -> Responsive Web Design!",
"Design responsive websites using Free tools Inkscape and Bootstrap 3! Mood Boards and Style Tiles to Mobile First!",
"English"), .Dim = c(3L, 8L), .Dimnames = list(c("Title", "Short_Description",
"Language"), c("1", "2", "4", "5", "6", "9", "13", "15")))
I would like to extract keywords of every column independently. For such purpose, I use the udpipe
package from R.
As I want to run the functions in every column, I run a for
loop.
Before starting, we create the model with English as reference (see this link for more info):
library(udpipe)
ud_model <- udpipe_download_model(language = "english")
ud_model <- udpipe_load_model(ud_model$file_model)
Ideally, my final output would be a dataframe with 8 columns, and so many rows as keywords were extracted.
I tried two methods:
dplyr
library(dplyr)
keywords <- list()
for(i in ncol(keywords_en_t)){
keywords[[i]] <- keywords_en_t %>%
udpipe_annotate(ud_model,s)
as.data.frame()
}
key <- list()
stats <- list()
for(i in ncol(keywords_en_t)){
key[[i]] <- as.data.frame(udpipe_annotate(ud_model, x = keywords_en_t[,i]))
stats[[i]] <- subset(key[[i]], upos %in% "NOUN")
stats <- txt_freq(x = stats$lemma)
}
In both cases, or I get some errors or the output is not the expected.
As said, the output I expect is a dataframe with 8 columns representing in rows the keywords
Any idea?
Upvotes: 0
Views: 255
Reputation: 23608
Unfortunately your code contains a lot of mistakes. Your loops don't go from 1 to the number of columns, but start just at 8. Either use 1:ncol
or seq_along
.
Your key data is a matrix, not a data.frame. You need to supply udpipe_annotate
a character vector. If you just supply a key[, 8] you are also supplying the dimnames to udpipe_annotate
. That might generate keywords you don't need. In method 1 you use udpipe_annotate(ud_model,s) but there is no s
defined. In the method 2 you use stats[[i]], and immediately afterwords you overwrite this by using stats.
To correct some things, first I transformed the data into a data.frame. Next I run the loop to create a list of vectors containing the keywords. After this I created a data.frame of the keywords. This part of the code takes into account different lengths of the vectors.
You might want to check on how you get your data, because it is more logical /tidy to have 3 columns ("Title", "Short_Description", "Language") and lots of rows.
# Transform key into a data.frame. Now it is a matrix.
key <- as.data.frame(key, stringsAsFactors = FALSE)
library(udpipe)
# prevent downloading ud model if it already exists in the working directory
ud_model <- udpipe_download_model(language = "english", overwrite = FALSE)
ud_model <- udpipe_load_model(ud_model$file_model)
# prepare list with correct length
keywords <- vector(mode = "list", length = ncol(key))
for(i in 1:ncol(key)){
temp <- as.data.frame(udpipe_annotate(ud_model, x = key[, i]))
keywords[[i]] <- temp$lemma[temp$upos == "NOUN"]
}
#transform list of vectors to data.frame.
# Use sapply because vectors are of different lengths.
keywords <- as.data.frame(sapply(keywords, '[', seq(max(lengths(keywords)))), stringsAsFactors = FALSE)
keywords
V1 V2 V3 V4 V5 V6 V7 V8
1 skill beginners beginners logo learning 2d Design web
2 map design guide award curve Texture format design
3 software Vector experience logo program guide <NA> design
4 <NA> graphics <NA> design design texture <NA> website
5 <NA> vector <NA> <NA> course inkscape <NA> tool
6 <NA> graphic <NA> <NA> <NA> <NA> <NA> <NA>
Upvotes: 1