user42372
user42372

Reputation:

Average variable by another variable in R

This may be a naive question but I am running a regression of a variable on a set of other variables.

But each country has several observations and the original regression has a pooled sample. Now I want the averages of each country and run a regression on the averages.

For example I have 50 countries and each country has either 3 or 4 observations. Now I want an average of each variable by country. So that in the end every independent/dependent variable has 50 observations, one for each country.

Right now I am using the aggregate command but its creating a variable with country name and average value both. So I am not able to run a regression on these variables.

This for example what I have

Country/ some-observation/ Some-other-observation/ some-other-observation-2
Somalia/ 3 / 7 / . ...
USA/ 7 / 8 / ...
Nigeria/ 5/ 8 / ...
Nigeria/ 9 / 2 / ..
India/ 4 / 7/ ..
India/ 7 / 9/ ..
UK/ 8 / 1/ ..
UK /5/ 5 / ..

etc

Upvotes: 2

Views: 1633

Answers (1)

Glen_b
Glen_b

Reputation: 8252

One very good way to generate such things is to use tapply.

#set up some data
mycodat <- read.csv(stdin(),header=TRUE)
country,obsv
Spain,4     
Spain,5
Portugal,3
Portugal,7
Venezuala,8
Zambia,2
Zambia,4
Zambia,3

regdat <- data.frame(country=unique(mycodat$country)) # the thing you're trying to get
                                                      # the country means into

At this point we have two sets of data, the first with multiple values per country and the second with one row per country where you want to put the mean data. So here's how to do it.

# Now generate the summary and put it in the data
regdat$meanobsv <- with(mycodat,tapply(obsv,country,mean))

Upvotes: 5

Related Questions