user3616457
user3616457

Reputation: 43

R type int / char confusion in a dataframe

I have a DataFrame from which I've created another DataFrame. Somewhere along the line, things got messed up, but I'm not sure where, and how to fix it.

The code worked on the first dataframe, so I assume it's some sort of type mismatch? Do I need to convert the fields back to string somehow?

##creating the second data frame

adat2 <- data.frame(id=character(), Title=character(), Domain=character(), lemtext1=character(), Language=character(), day=character())


##copying from the first one, whilst splitting rows into multiple rows based on lemtext

for (row in 1:nrow(adat1)) {
    splitlines <- strsplit(adat1$lemtext[row], ", |\\. |: |; ")[[1]]
    for (row2 in 1:NROW(splitlines)){
        adat2 <- add_row(adat2, id=adat1$id[row], Title=adat1$Title[row], Domain=adat1$Domain[row], lemtext1=splitlines[row2], Language=adat1$Language[row], day=adat1$day[row])
    }
}

##trying to work with the new dataframe

tokens <- space_tokenizer(adat2$`lemtext2`[which(((adat2$Domain=="index.hu") |
                                                   (adat2$Domain=="hvg.hu") | (adat1$Domain=="24.hu") | (adat1$Domain=="444.hu")) & 
                                                   (adat2$day>=as.Date("2018-10-13")) & (adat1$day<=as.Date("2019-10-13")))])

getting error messages

error message from R studio

adat1 doutput:

https://www.pastiebin.com/5df253f6b79aa

Upvotes: 0

Views: 32

Answers (1)

ricoderks
ricoderks

Reputation: 1619

In adat2 everything is a factor. This has to do how you created adat2. You need to add stringAsFactors = FALSE to the data.frame() function.

adat2 <- data.frame(id = character(),
                    Title = character(),
                    Domain = character(),
                    lemtext1 = character(),
                    Language = character(),
                    day = character(),
                    stringAsFactors = FALSE)

If you want to now what kind of columns you have. You should str(adat2) or per column you can use e.g. class(adat2$id).

Upvotes: 1

Related Questions