AGUY
AGUY

Reputation: 425

Create a new data frame that will act as a dictionary with key and value pairs

I was playing with some data and trying to create a new data frame that contains key and value pairs that could be a dictionary. Here's some sample data and a quick manual solution.

df = data.frame(col1 = c("one", "one", "two", "two", "one"),
                col2 = c("AG", "AB", "AC", "AG", "AB"),
                col3 = c("F3", "F1", "F2", "F3", "F2") )
df
d1 = data.frame(vals = unique(df$col1))
d2 = data.frame(vals = unique(df$col2))
d3 = data.frame(vals = unique(df$col3))
d1 
d2 
d3
d1$name = "col1"
d2$name = "col2"
d3$name = "col3"
d1
d2
d3
rbind(d1,d2,d3)

Of course, this is a simple use case so real data is going to be a bit more mundane. For that reason, I was looking for a loop that could go through and set the key value pairs in a dictionary.

Most of my attempts have resulted in failure. Here's the format for my solution but I'm not sure how to dynamically create the new_df dictionary. Any suggestions?

new_df=data.frame()
prod.cols = c("col1", "col2", "col3")
for(col in prod.cols){
  if(col %in% colnames(df)){
    ## solution in here
  }
}

new_df

Upvotes: 0

Views: 4358

Answers (3)

Jota
Jota

Reputation: 17611

alistaire's answer is quite elegant and readable. Just for fun, here's a base R approach. Not that efficiency is particularly important here, but this scales relatively well as more rows and columns are added:

My second and third approaches are nicer than my first, so I'm moving them to the top of the answer:

Approach # 2, implementing thelatemail's comment for a nice, efficient one-liner:

stack(lapply(df, function(ii) as.character(unique(ii))))

What's nice about this solution is that it first reduces the columns using unique, which makes less work for as.character and then for stack.


Approach # 3: more concise and more efficient version of approach 2 that avoids the need for unique and character conversion by using levels to deal with the factor columns:

stack(lapply(df, levels))

First approach:

Reduce(rbind,
    lapply(seq_along(df), 
        function(ii) data.frame(vals = unique(df[, ii]), name = names(df)[ii])
    )
)

# vals name
#1 one col1
#2 two col1
#3  AG col2
#4  AB col2
#5  AC col2
#6  F3 col3
#7  F1 col3
#8  F2 col3

Using do.call instead of Reduce is roughly equivalent here:

do.call(rbind,
    lapply(seq_along(df), 
        function(ii) data.frame(vals = unique(df[, ii]), name = names(df)[ii])
    )
) 

Upvotes: 4

akrun
akrun

Reputation: 887128

We can also do

library(reshape2)
unique(melt(as.matrix(df))[-1])

Upvotes: 1

alistaire
alistaire

Reputation: 43334

tidyr makes this easy:

library(tidyr)

df %>% gather(name, vals) %>% unique()

#    name vals
# 1  col1  one
# 3  col1  two
# 6  col2   AG
# 7  col2   AB
# 8  col2   AC
# 11 col3   F3
# 12 col3   F1
# 13 col3   F2

Upvotes: 5

Related Questions