Irwin
Irwin

Reputation: 422

When I try to melt my data frame with mixed data types, I get NAs. How can I best resolve this?

My goal and context

I have a data frame in R that I want to melt using the reshape2 library. There are two reasons.

  1. I want to plot the score for each user for each question in a bar chart using ggplot.

  2. I want to put this data into Excel so I can see, per user, their sentiment, score, and mixed for motivation, attitudeBefore, etc. My intention was to use melt, then cast to put the data into wide format for easy Excel importing.

My problem

When I try to run melt, I get a warning and end up with NAs in my resulting molten data frame.

Warning messages:
1: In `[<-.factor`(`*tmp*`, ri, value = c(0.148024, 0.244452, -0.00421,  :
invalid factor level, NAs generated
2: In `[<-.factor`(`*tmp*`, ri, value = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,  :
invalid factor level, NAs generated

And I end up with a ton of NAs in my resulting melted data frame. I think it's because I'm using both characters and numerics in the same column.

My questions

I have two questions as a result.

Question 1: Is there a workaround for this in R?

Question 2: Is there a better way for me to structure my data to avoid this problem?

Code

Here's my code for creating the data frame.

words <- data.frame(read.delim("sentiments-test-subset-no-text.txt", header=FALSE))
names(words) <- c("level", "question", "user", "sentiment", "score", "mixed")
words$user <- as.factor(words$user)
words.m <- melt(words, id.vars=c("user", "level"), measure.vars=c("sentiment", "score",     "mixed"))

I'm pretty new to reshape and melt but I think that's what I want in the last line.

Data

The data in human-readable format looks like this.

experimental    motivated   1   positive    0.148024    0
experimental    motivated   2   positive    0.244452    0
experimental    motivated   3   negative       -0.004210    0
experimental    motivated   4   unknown         0.000000    0
experimental    attitudeBefore  1   negative       -0.241500    0
experimental    attitudeBefore  2   neutral         0.000000    0
experimental    attitudeBefore  3   neutral         0.000000    0
experimental    attitudeBefore  4   unknown         0.000000    0

dput dump

dput below.

structure(list(level = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L), .Label = "experimental", class = "factor"), question = structure(c(2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("attitudeBefore", "motivated"
), class = "factor"), user = structure(c(1L, 2L, 3L, 4L, 1L, 
2L, 3L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"), 
sentiment = structure(c(3L, 3L, 1L, 4L, 1L, 2L, 2L, 4L), .Label = c("negative", 
"neutral", "positive", "unknown"), class = "factor"), score = c(0.148024, 
0.244452, -0.00421, 0, -0.2415, 0, 0, 0), mixed = c(0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("level", "question", 
"user", "sentiment", "score", "mixed"), row.names = c(NA, -8L
), class = "data.frame")

Upvotes: 5

Views: 4603

Answers (1)

Ricardo Saporta
Ricardo Saporta

Reputation: 55420

It looks like you might simply be using the wrong library. reshape and reshape2 are not the same thing.

library(reshape2)
words.m <- melt(words, id.vars=c("user", "level"), measure.vars=c("sentiment", "score",     "mixed"))
# no problem

detach(package:reshape2)

# using reshape instead of reshape2
library(reshape)
words.m <- melt(words, id.vars=c("user", "level"), measure.vars=c("sentiment", "score",     "mixed"))
# Warning messages:
# 1: In `[<-.factor`(`*tmp*`, ri, value = c(3L, 3L, 1L, 4L, 1L, 2L, 2L,  :
#   invalid factor level, NAs generated
# 2: In `[<-.factor`(`*tmp*`, ri, value = c(3L, 3L, 1L, 4L, 1L, 2L, 2L,  :
#   invalid factor level, NAs generated

if reshape2 is not available on your system, you can install it from CRAN

 install.packages("reshape2")

Upvotes: 4

Related Questions