Reputation: 25
I have a dataframe with two columns. An ID column and a character column containing key value pairs delimited by semicolon.
ID | KeyValPairs
1 | "zx=1; ds=4; xx=6"
2 | "qw=5; df=2"
. | ....
I want to turn this into a dataframe with three columns
ID | Key | Val
1 | zx | 1
1 | ds | 4
1 | xx | 6
2 | qw | 5
2 | df | 2
There is no fixed number of key value pairs in the KeyValPairs column, and no closed set of possible keys. I have been goofing around with solutions that involve looping and reinserting into an empty dataframe, but it is not working properly and I am told I should avoid loops in R.
Upvotes: 2
Views: 2528
Reputation: 28461
A tidyr and dplyr approach:
tidyr
library(tidyr)
library(reshape2)
s <- separate(df, KeyValPairs, 1:3, sep=";")
m <- melt(s, id.vars="ID")
out <- separate(m, value, c("Key", "Val"), sep="=")
na.omit(out[order(out$ID),][-2])
# ID Key Val
# 1 1 zx 1
# 3 1 ds 4
# 5 1 xx 6
# 2 2 qw 5
# 4 2 df 2
dplyrish
library(tidyr)
library(dplyr)
df %>%
mutate(KeyValPairs = strsplit(as.character(KeyValPairs), "; ")) %>%
unnest(KeyValPairs) %>%
separate(KeyValPairs, into = c("key", "val"), "=")
#courtesy of @jeremycg
Data
df <- structure(list(ID = c(1, 2), KeyValPairs = structure(c(2L, 1L
), .Label = c(" qw=5; df=2", " zx=1; ds=4; xx=6"), class = "factor")), .Names = c("ID",
"KeyValPairs"), class = "data.frame", row.names = c(NA, -2L))
Upvotes: 6
Reputation: 54247
Maybe also a case for {splitstackshape}
from @AnandaMahto:
df <- read.table(sep = "|", header = TRUE, text = '
ID | KeyValPairs
1 | "zx=1; ds=4; xx=6"
2 | "qw=5; df=2"')
library(splitstackshape)
setNames(
cSplit(cSplit(df, 2, ";", "long"), 2, "="),
c("id", "key", "val")
)
# id key val
# 1: 1 zx 1
# 2: 1 ds 4
# 3: 1 xx 6
# 4: 2 qw 5
# 5: 2 df 2
Upvotes: 2
Reputation: 24510
A data.table
solution, just to use tstrsplit
:
library(data.table) # V 1.9.6+
setDT(df)[, .(key = unlist(strsplit(as.character(KeyValPairs), ";"))), by = ID
][, c("Val", "Key") := tstrsplit(key, "=")
][, key := NULL][]
# ID Val Key
#1: 1 zx 1
#2: 1 ds 4
#3: 1 xx 6
#4: 2 qw 5
#5: 2 df 2
Upvotes: 3