Reputation: 106
I would like to perform a certain operation which will transform the data in the provided format:
Col_A Col_B
textA textB 10
textB textC 20
textC textD 30
textD textE 40
textE textF 20
ColA ColB(Frequency) ColC
textA 1 10
textB 2 10+20
textC 2 20+30
textD 2 30+40
textE 2 40+20
textF 1 20
ColA ColB(Frequency) ColC
textA 1 10
textB 2 30
textC 2 50
textD 2 70
textE 2 60
textF 1 20
Currently I am using
k <- (dfm(A2$Query, ngrams = 1, concatenator = " ", verbose = FALSE))
k <- colSums(k)
k <- as.data.frame(k)
And this has given me frequency column. How to achieve colC ?
Upvotes: 1
Views: 108
Reputation: 887721
Here is another option with separate/gather
library(dplyr)
library(tidyr)
separate(df1, Col_A, into = c("Col_A1", "Col_A2")) %>%
gather(Var, ColA, -Col_B) %>%
group_by(ColA) %>%
summarise(Freq=n(),Col_C= sum(Col_B))
# ColA Freq Col_C
# (chr) (int) (int)
#1 textA 1 10
#2 textB 2 30
#3 textC 2 50
#4 textD 2 70
#5 textE 2 60
#6 textF 1 20
Or with base R
options by splitting the 'Col_A' by space, replicate the 'Col_B' by the lengths
of the list
output from 'lst' to create a data.frame
and then use aggregate
to get the length
and sum
of 'Col_B'.
lst <- strsplit(df1$Col_A, " ")
d1 <- data.frame(Col_A= unlist(lst), Col_C=rep(df1$Col_B, lengths(lst)))
do.call(data.frame, aggregate(.~Col_A, d1, function(x) c(length(x), sum(x))))
Upvotes: 1
Reputation: 24198
We could use cSplit()
from the splitstackshape
package in combination with dplyr
.
library(splitstackshape)
library(dplyr)
cSplit(df, "Col_A", sep = " ", direction = "long") %>%
group_by(Col_A) %>%
summarise(Freq = n(), ColC = sum(Col_B))
# Col_A Freq ColC
# (fctr) (int) (int)
#1 textA 1 10
#2 textB 2 30
#3 textC 2 50
#4 textD 2 70
#5 textE 2 60
#6 textF 1 20
Data
df <- structure(list(Col_A = structure(1:5, .Label = c("textA textB",
"textB textC", "textC textD", "textD textE", "textE textF"), class = "factor"),
Col_B = c(10L, 20L, 30L, 40L, 20L)), .Names = c("Col_A",
"Col_B"), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 4