Keri
Keri

Reputation: 375

Grouping with numeric variables

I hava a dataframe like this:

name, value
stockA,Google
stockA,Yahoo
stockB,NA
stockC,Google

I would like to convert the values of rows of the second column to columns and keep the first one and in other have a numeric value to 0 and 1 if not exist or exist the value. Here an example of the expected output:

name,Google,Yahoo
stockA,1,1
stockB,0,0
stockC,1,0

I tried this:

library(reshape2)
df2 <- dcast(melt(df, 1:2, na.rm = TRUE), df + name ~ value, length)

and the error it gives me is this:

Using value as value column: use value.var to override.
Error in `[.data.frame`(x, i) : undefined columns selected

Any idea for the error?

An example in which the previous code works. Data (df):

name,nam2,value
stockA,sth1,Yahoo
stockA,sth2,NA
stockB,sth3,Google

and this works:

df2 <- dcast(melt(df, 1:2, na.rm = TRUE), name + nam2 ~ value, length)

Upvotes: 0

Views: 74

Answers (4)

Uwe
Uwe

Reputation: 42544

The OP has asked to get an explanation for the error caused by

dcast(melt(df, 1:2, na.rm = TRUE), df + name ~ value, length)

(I'm quite astonished that no one so far has tried to improve the OP's reshape2 approach to return exactly the expected answer).

There are several issues with OP's code:

  1. df appears in the dcast() formula.
  2. The second parameter to melt() is 1:2 which means that all columns are used as id.vars. It should read 1.
  3. But the most crucial point is that the data.frame df already is in long format and doesn't need to be reshaped.

So, df can be used directly in dcast():

library(reshape2)
dcast(df[!is.na(df$value), ], name ~ value, length, drop = FALSE)
#    name Google Yahoo
#1 stockA      1     1
#2 stockB      0     0
#3 stockC      1     0

In order to avoid a third NA column appearing in the result, the NA rows have to be filtered out of df before reshaping. On the other hand, drop = FALSE is required to ensure stockB is included in the result.

Data

df <- data.frame(name = c("stockA", "stockA", "stockB", "stockC"), 
                 value = c("Google", "Yahoo", NA, "Google"))
df
#    name  value
#1 stockA Google
#2 stockA  Yahoo
#3 stockB   <NA>
#4 stockC Google

Upvotes: 1

jogo
jogo

Reputation: 12559

You can do it also with base R:

df <- read.table(header=TRUE, sep=',', text=
'name, value
stockA,Google
stockA,Yahoo
stockB,NA
stockC,Google')
xtabs(~., data=df)
#        value
#name     Google Yahoo
#  stockA      1     1
#  stockB      0     0
#  stockC      1     0

Upvotes: 0

jess
jess

Reputation: 534

You can do that with spread from the tidyr package.

df <- data.frame(name = c("stockA", "stockA", "stockB", "stockC"),
                 value = c("Google", "Yahoo", NA, "Google"))
df$row <- 1
df %>% 
  spread(value, row, fill = 0) %>% 
  select(-`<NA>`)

Upvotes: 1

Patrick Williams
Patrick Williams

Reputation: 704

Try df2 <- dcast(melt(df, 1:2, na.rm = TRUE), name ~ value, length)

Just remove df + from the equation.

Though this will give you an extra column for NA values, which makes me think the na.rm argument isn't working properly in your formulation.

Upvotes: 0

Related Questions