Syd Amerikaner
Syd Amerikaner

Reputation: 105

Getting corresponding values from data.frame

my problem is that I can't really get my problem down in words which makes it hard to google it, so I am forced to ask you. I hope you will shed light on my issue:

I got a data.frame like this:

6 4
5 2
3 6
0 7
0 2
1 3
6 0
1 1

As you noticed, in the first column I got 0 repeating two times, 1 two times and so one. What I would like to do is get get all the corresponging values for one number, say 0, in the second columns (in this example 7 and 2). Preferably in data.frame.

I know the attempt with df$V2[which(df$V1==0)], however since the first column might have over 100 rows I can't really use this. Do you guys have a good solution?

Maybe some words regarding the background of this question: I need to process this data, i.e. get the mean of the second column for all 0's in the first columns, or get min/max values.

Regards

Upvotes: 1

Views: 92

Answers (5)

akrun
akrun

Reputation: 886948

Using data.table

library(data.table)
setDT(df)[, list(mean=mean(V2), max= max(V2), min=min(V2)), by = V1]

Upvotes: 1

Syd Amerikaner
Syd Amerikaner

Reputation: 105

Thanks all for your replies. I decided to go for the dplyr solution posted by teucer and eipi10. Since I have a third (and maybe even a fourth) column, this solution seems to be pretty easy to use (just adding V3 to group_by).

Since some are asking what's wrong with df$V2[which(df$V1==0)]: I maybe was a bit unclear when saying "rows", was I actually meant was "values". Let's assume I had n distinct values in the first column, I would have to use the command n times for all distinct values and store the n resulting vectors.

Upvotes: 0

Gavin Simpson
Gavin Simpson

Reputation: 174788

Using your data (with some temporary names attached)

txt <- "6 4
5 2
3 6
0 7
0 2
1 3
6 0
1 1"
df <- read.table(text = txt)
names(df) <- paste0("Var", seq_len(ncol(df)))

Coerce the first column to be a factor

df <- transform(df, Var1 = factor(Var1))

Then you can use aggregate() with a nice formula interface

aggregate(Var2 ~ Var1, data = df, mean)
aggregate(Var2 ~ Var1, data = df, max)
aggregate(Var2 ~ Var1, data = df, min)

(eg:

> aggregate(Var2 ~ Var1, data = df, mean)
  Var1 Var2
1    0  4.5
2    1  2.0
3    3  6.0
4    5  2.0
5    6  2.0

) or using the default interface

with(df, aggregate(Var2, list(Var1), FUN = mean))

> with(df, aggregate(Var2, list(Var1), FUN = mean))
  Group.1   x
1       0 4.5
2       1 2.0
3       3 6.0
4       5 2.0
5       6 2.0

But the output is nicer from the formula interface.

Upvotes: 1

user295691
user295691

Reputation: 7248

First, what exactly is the issue with the solution you suggest? Is it a question of efficiency? Frankly the code you present is close to optimal [1].

For the general case, you're probably looking at a split-apply-combine action, to apply a function to subsets of the data based on some differentiator. As @teucer points out, dplyr (and it's ancestor, plyr) are designed for exactly this, as is data.tables. In vanilla R, you would tend to use by or aggregate (or split and sapply for more advanced usage) for the same task. For example, to compute group means, you would do

by(df$V2, df$V1, mean)

or

aggregate(df, list(type=df$V1), mean)

Or even

sapply(split(df$V2, df$V1), mean)

[1] The code can be simplified to df$V2[df$V1 == 0] or df[df$V1 == 0,] as well.

Upvotes: 0

teucer
teucer

Reputation: 6238

Here a solution using dplyr

df %>% group_by(V1) %>% summarize(ME=mean(V2))

Upvotes: 2

Related Questions