Reputation: 425
I was playing with some data and trying to create a new data frame that contains key and value pairs that could be a dictionary. Here's some sample data and a quick manual solution.
df = data.frame(col1 = c("one", "one", "two", "two", "one"),
col2 = c("AG", "AB", "AC", "AG", "AB"),
col3 = c("F3", "F1", "F2", "F3", "F2") )
df
d1 = data.frame(vals = unique(df$col1))
d2 = data.frame(vals = unique(df$col2))
d3 = data.frame(vals = unique(df$col3))
d1
d2
d3
d1$name = "col1"
d2$name = "col2"
d3$name = "col3"
d1
d2
d3
rbind(d1,d2,d3)
Of course, this is a simple use case so real data is going to be a bit more mundane. For that reason, I was looking for a loop that could go through and set the key value pairs in a dictionary.
Most of my attempts have resulted in failure. Here's the format for my solution but I'm not sure how to dynamically create the new_df dictionary. Any suggestions?
new_df=data.frame()
prod.cols = c("col1", "col2", "col3")
for(col in prod.cols){
if(col %in% colnames(df)){
## solution in here
}
}
new_df
Upvotes: 0
Views: 4358
Reputation: 17611
alistaire's answer is quite elegant and readable. Just for fun, here's a base R approach. Not that efficiency is particularly important here, but this scales relatively well as more rows and columns are added:
My second and third approaches are nicer than my first, so I'm moving them to the top of the answer:
Approach # 2, implementing thelatemail's comment for a nice, efficient one-liner:
stack(lapply(df, function(ii) as.character(unique(ii))))
What's nice about this solution is that it first reduces the columns using unique
, which makes less work for as.character
and then for stack
.
Approach # 3: more concise and more efficient version of approach 2 that avoids the need for unique
and character conversion by using levels
to deal with the factor columns:
stack(lapply(df, levels))
First approach:
Reduce(rbind,
lapply(seq_along(df),
function(ii) data.frame(vals = unique(df[, ii]), name = names(df)[ii])
)
)
# vals name
#1 one col1
#2 two col1
#3 AG col2
#4 AB col2
#5 AC col2
#6 F3 col3
#7 F1 col3
#8 F2 col3
Using do.call
instead of Reduce
is roughly equivalent here:
do.call(rbind,
lapply(seq_along(df),
function(ii) data.frame(vals = unique(df[, ii]), name = names(df)[ii])
)
)
Upvotes: 4