chen
chen

Reputation: 371

R Find the average of all rows and create a new data frame for ploting

I am new to R. I want to sum up all rows and create a new dataframe. The new data frame will be used for a line chart. For example, If I have source data like this:

Date       ID  hour0 hour1 hour1 ... hour24
2015-01-01 X1  20     30    40         100
2015-01-01 X1  30     40    50         400
.......................................
2015-12-31 X1  40     50    60         400

I want to find the average of all rows(Except rows of Date and ID). So, in my example, it will be a new data frame of (30,40,50,...,300). Is there a way to do the conversion?

After the conversion, I want to plot the number in a line chart, where x axis can be just 0,1,2,3,4,5..etc.

Can I get some help? Thanks!

Upvotes: 0

Views: 1115

Answers (2)

Gregor Thomas
Gregor Thomas

Reputation: 145775

It seems like you want the sum/average up each column, not each row. That is, you want the average of the hour0 column, of the hour1 column, etc.

Here's a good solution:

# special functions for this purpose
colSums(df[, -(1:2)]) # sum all columns except the first two
colMean(df[, -(1:2)]) # average all columns except the first two

# general purpose, works with any function
sapply(df[, -(1:2)], sum) # sum all columns except the first two
sapply(df[, -(1:2)], mean) # average all columns except the first two
sapply(df[, -(1:2)], sd) # standard deviation of all columns except the first two
             # because colSds() isn't a built-in function like colMeans or colSums

To plot any of these, assign the result (give it a name, say, my_sum <- ...), and then you can do plot(my_sum, type = "l") to generate a simple line plot.

Upvotes: 0

Bill O&#39;Brien
Bill O&#39;Brien

Reputation: 872

Here's a solution with a small simulated dataframe. Not sure why you'd create a new dataframe, but here is a way to create a new (mean value) column from the existing columns. If you truly want a new dataframe, just change the last assignment function (dfNew$rowMean <- ...) .

set.seed(0)
df <- data.frame(hour0 = runif(n=5), hour1 = runif(n=5), hour2 = runif(n=5))

# vector of all columns whose name contains 'hour'
cols <- names(df)[grepl('hour', names(df))]

df$rowMean <- rowMeans(df[, cols])
df

> df
      hour0     hour1      hour2   rowMean
1 0.8966972 0.2016819 0.06178627 0.3867218
2 0.2655087 0.8983897 0.20597457 0.4566243
3 0.3721239 0.9446753 0.17655675 0.4977853
4 0.5728534 0.6607978 0.68702285 0.6402247
5 0.9082078 0.6291140 0.38410372 0.6404752

Upvotes: 2

Related Questions