Reputation: 43
I have a very large data frame like:
df = data.frame(nr = c(3,3,4), dependeny = c("6/3/1", "9/3/1",
"5/4/4/1"), token=c("Trotz des Rückgangs",
"Trotz meherer Anfragen", "Trotz des ärgerlichen Unentschiedens"))
nr dependeny token
1 3 6/3/1 Trotz des Rückgangs
2 3 9/3/1 Trotz meherer Anfragen
3 4 5/4/4/1 Trotz des ärgerlichen Unentschiedens
I would like to add a 4th column with an extract from "token", depending on values in "nr" and "dependency". More precisely, I want the elements from "token", that correspond to the values in "dependency" that correspond to "nr".
Examples: Row 1: I want "des", because "nr" is 3, and 2 is the second element in "dependency". The second element in "token" is "des".
Row 3: I want "des ärgerlichen", because "nr" is 4, and 4 is the second and third element in "dependency". The second and third elements in "tokens" are "des ärgerlichen.
I've tried with split and str_split, but do not know how to address the resulting elements.
Upvotes: 3
Views: 115
Reputation: 193527
One option is to split the data into a "long" form. There are several ways to do this, one of which is to use cSplit
from my "splitstackshape" package.
library(splitstackshape)
cSplit(as.data.table(df)[, rn := .I],
c("dependeny", "token"), c("/", " "), "long")[nr == dependeny]
# nr dependeny token rn
# 1: 3 3 des 1
# 2: 3 3 meherer 2
# 3: 4 4 des 3
# 4: 4 4 ärgerlichen 3
Note that I've added in the row numbers. That allows us to paste things back together, if desired:
cSplit(as.data.table(df)[, rn := .I], ## Adds row numbers
c("dependeny", "token"), c("/", " "), "long")[ ## Splits the data into rows
nr == dependeny][ ## Selects the values of interest
, paste(token, collapse = " "), by = rn] ## Pastes the token values together
# rn V1
# 1: 1 des
# 2: 2 meherer
# 3: 3 des ärgerlichen
Upvotes: 1
Reputation: 887128
We can use base R
methods to create the 4th column.
unlist(Map(function(x,y,z) paste(z[x==y], collapse=' '),
df$nr,strsplit(as.character(df$dependeny), '/'),
strsplit(as.character(df$token), ' ')))
#[1] "des" "meherer" "des ärgerlichen"
Upvotes: 1