Reputation: 375
I hava a dataframe like this:
name, value
stockA,Google
stockA,Yahoo
stockB,NA
stockC,Google
I would like to convert the values of rows of the second column to columns and keep the first one and in other have a numeric value to 0 and 1 if not exist or exist the value. Here an example of the expected output:
name,Google,Yahoo
stockA,1,1
stockB,0,0
stockC,1,0
I tried this:
library(reshape2)
df2 <- dcast(melt(df, 1:2, na.rm = TRUE), df + name ~ value, length)
and the error it gives me is this:
Using value as value column: use value.var to override.
Error in `[.data.frame`(x, i) : undefined columns selected
Any idea for the error?
An example in which the previous code works. Data (df):
name,nam2,value
stockA,sth1,Yahoo
stockA,sth2,NA
stockB,sth3,Google
and this works:
df2 <- dcast(melt(df, 1:2, na.rm = TRUE), name + nam2 ~ value, length)
Upvotes: 0
Views: 74
Reputation: 42544
The OP has asked to get an explanation for the error caused by
dcast(melt(df, 1:2, na.rm = TRUE), df + name ~ value, length)
(I'm quite astonished that no one so far has tried to improve the OP's reshape2
approach to return exactly the expected answer).
There are several issues with OP's code:
df
appears in the dcast()
formula.melt()
is 1:2
which means that all columns are used as id.vars
. It should read 1
.df
already is in long format and doesn't need to be reshaped.So, df
can be used directly in dcast()
:
library(reshape2)
dcast(df[!is.na(df$value), ], name ~ value, length, drop = FALSE)
# name Google Yahoo
#1 stockA 1 1
#2 stockB 0 0
#3 stockC 1 0
In order to avoid a third NA
column appearing in the result, the NA
rows have to be filtered out of df
before reshaping. On the other hand, drop = FALSE
is required to ensure stockB
is included in the result.
df <- data.frame(name = c("stockA", "stockA", "stockB", "stockC"),
value = c("Google", "Yahoo", NA, "Google"))
df
# name value
#1 stockA Google
#2 stockA Yahoo
#3 stockB <NA>
#4 stockC Google
Upvotes: 1
Reputation: 12559
You can do it also with base R:
df <- read.table(header=TRUE, sep=',', text=
'name, value
stockA,Google
stockA,Yahoo
stockB,NA
stockC,Google')
xtabs(~., data=df)
# value
#name Google Yahoo
# stockA 1 1
# stockB 0 0
# stockC 1 0
Upvotes: 0
Reputation: 534
You can do that with spread
from the tidyr
package.
df <- data.frame(name = c("stockA", "stockA", "stockB", "stockC"),
value = c("Google", "Yahoo", NA, "Google"))
df$row <- 1
df %>%
spread(value, row, fill = 0) %>%
select(-`<NA>`)
Upvotes: 1
Reputation: 704
Try df2 <- dcast(melt(df, 1:2, na.rm = TRUE), name ~ value, length)
Just remove df +
from the equation.
Though this will give you an extra column for NA values, which makes me think the na.rm
argument isn't working properly in your formulation.
Upvotes: 0