Reputation: 1572
How to count frequencies occurring in two columns ? Sample datas :
> sample <- dput(df)
structure(list(Nom_xp = c("A1FAA", "A1FAJ", "A1FBB", "A1FJA",
"A1FJR", "A1FRJ"), GB05.x = c(100L, 98L, NA, 100L, 102L, 98L),
GB05.1.x = c(100L, 106L, NA, 100L, 102L, 98L), GB18.x = c(175L,
173L, 177L, 177L, 173L, 177L), GB18.1.x = c(177L, 175L, 177L,
177L, 177L, 177L)), .Names = c("Nom_xp", "GB05.x", "GB05.1.x",
"GB18.x", "GB18.1.x"), row.names = c(NA, 6L), class = "data.frame")
Counting frequencies :
apply(sample[,2:5],2,table)
Now, how to combine the count by prefix of columns, or by every two columns ? The expected output, for the four first columns would be a list:
$GB05
98 100 102 106
3 4 2 1
$GB18
173 175 177
2 2 8
One way to get the count for the first two columns :
table(c(apply(sample[,2:3],2,rbind)))
98 100 102 106
3 4 2 1
But how to apply this to a whole data.frame ?
Upvotes: 1
Views: 467
Reputation: 193687
Here is another answer that is sort of a hybrid between Anthony's answer and juba's answer.
The first step is to convert the data.frame
into a "long" data.frame
. I generally use stack
when I can, but you can also do library(reshape2); df2 <- melt(df)
to get output similar to my df2
object.
df2 <- data.frame(df[1], stack(df[-1]))
head(df2)
# Nom_xp values ind
# 1 A1FAA 100 GB05.x
# 2 A1FAJ 98 GB05.x
# 3 A1FBB NA GB05.x
# 4 A1FJA 100 GB05.x
# 5 A1FJR 102 GB05.x
# 6 A1FRJ 98 GB05.x
Next, we need to know the unique values of ind
. juba did that with substr
, but I've done it here with gsub
and a regular expression. We don't need to add that into our data.frame
; we can call it directly in our other functions. The two functions which immediately come to mind are by
and tapply
, and both give you the output you are looking for.
by(df2$values,
list(ind = gsub("([A-Z0-9]+)\\..*", "\\1", df2$ind)),
FUN=table)
# ind: GB05
#
# 98 100 102 106
# 3 4 2 1
# ------------------------------------------------------------------------------
# ind: GB18
#
# 173 175 177
# 2 2 8
tapply(df2$values, gsub("([A-Z0-9]+)\\..*", "\\1", df2$ind), FUN = table)
# $GB05
#
# 98 100 102 106
# 3 4 2 1
#
# $GB18
#
# 173 175 177
# 2 2 8
Upvotes: 1
Reputation: 49073
If you want to apply table
to your whole data frame, you can use :
table(unlist(sample[,-1]))
Which gives :
98 100 102 106 173 175 177
3 4 2 1 2 2 8
If you want to group by column name prefix, for example the 4th first characters, you can do something like this :
cols <- names(sample)[-1]
groups <- unique(substr(cols,0,4))
sapply(groups, function(name) table(unlist(sample[,grepl(paste0("^",name),names(sample))])))
Which gives :
$GB05
98 100 102 106
3 4 2 1
$GB18
173 175 177
2 2 8
Upvotes: 2
Reputation: 6124
i would've said juba's answer was correct, but given you're looking for something else, perhaps it's this?
library(reshape2)
x <- melt( sample[ , 2:5 ] )
table( x[ , c( 'variable' , 'value' ) ] )
which gives
value
variable 98 100 102 106 173 175 177
GB05.x 2 2 1 0 0 0 0
GB05.1.x 1 2 1 1 0 0 0
GB18.x 0 0 0 0 2 1 3
GB18.1.x 0 0 0 0 0 1 5
please provide an example of your desired output structure :)
Upvotes: 1