Reputation: 43
I have a DataFrame from which I've created another DataFrame. Somewhere along the line, things got messed up, but I'm not sure where, and how to fix it.
The code worked on the first dataframe, so I assume it's some sort of type mismatch? Do I need to convert the fields back to string somehow?
##creating the second data frame
adat2 <- data.frame(id=character(), Title=character(), Domain=character(), lemtext1=character(), Language=character(), day=character())
##copying from the first one, whilst splitting rows into multiple rows based on lemtext
for (row in 1:nrow(adat1)) {
splitlines <- strsplit(adat1$lemtext[row], ", |\\. |: |; ")[[1]]
for (row2 in 1:NROW(splitlines)){
adat2 <- add_row(adat2, id=adat1$id[row], Title=adat1$Title[row], Domain=adat1$Domain[row], lemtext1=splitlines[row2], Language=adat1$Language[row], day=adat1$day[row])
}
}
##trying to work with the new dataframe
tokens <- space_tokenizer(adat2$`lemtext2`[which(((adat2$Domain=="index.hu") |
(adat2$Domain=="hvg.hu") | (adat1$Domain=="24.hu") | (adat1$Domain=="444.hu")) &
(adat2$day>=as.Date("2018-10-13")) & (adat1$day<=as.Date("2019-10-13")))])
adat1 doutput:
https://www.pastiebin.com/5df253f6b79aa
Upvotes: 0
Views: 32
Reputation: 1619
In adat2
everything is a factor. This has to do how you created adat2
. You need to add stringAsFactors = FALSE
to the data.frame()
function.
adat2 <- data.frame(id = character(),
Title = character(),
Domain = character(),
lemtext1 = character(),
Language = character(),
day = character(),
stringAsFactors = FALSE)
If you want to now what kind of columns you have. You should str(adat2)
or per column you can use e.g. class(adat2$id)
.
Upvotes: 1