Nanna
Nanna

Reputation: 11

Create a table from a couple of summary statistics

I'm using R Studio Version 0.98.1062 on Mac(OS X Yosemite 10.10.1). I want to create a table (preferably to transfer it to excel or pdf format) from the data for several summary statistics describing the proportion of women enrolled in different disciplines:

summary(agriculture$X2009.PROP)

Min. 1st Qu. Median Mean 3rd Qu. Max.

0.0000 0.3333 0.4881 0.4689 0.6026 1.0000

summary(economics$X2009.PROP)

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's

0.0000 0.2555 0.3161 0.3218 0.3887 0.6923 29

summary(education$X2009.PROP)

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's

0.0000 0.2967 0.5000 0.5490 0.8571 1.0000 46

summary(law$X2009.PROP)

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's

0.0000 0.4250 0.5695 0.5324 0.6593 1.0000 28

Basically I want a table to look like this:

Discipline/SS Min.1st Qu. Median Mean 3rd Qu. Max.

agriculture 0.0000 0.3333 0.4881 0.4689 0.6026 1.0000

economics 0.0000 0.2555 0.3161 0.3218 0.6923 29

education ....

law ....

Will you be so kind to advise me how to write the code for that?

Upvotes: 0

Views: 5658

Answers (1)

Livius
Livius

Reputation: 3388

There are two basic ways you can do this: combining the data beforehand or afterwards.

Some sample data, randomly taken from the uniform distribution:

x <- runif(100)
y <- runif(100)

Combine and Summarize

If you want to combine the data beforehand, then you need to use data.frame():

d <- data.frame(variable1=x,variable2=y)
summary(d)

which will give you output like:

   variable1         variable2      
 Min.   :0.03026   Min.   :0.01173  
 1st Qu.:0.29410   1st Qu.:0.24968  
 Median :0.48517   Median :0.47524  
 Mean   :0.51137   Mean   :0.47865  
 3rd Qu.:0.71354   3rd Qu.:0.69512  
 Max.   :0.98465   Max.   :0.980

(Note that you can also do data.frame() without specifying column names, in which case the names of the variables will be used as column names.) This might take some work to wrangle it into the format you want, but it would probably be the better format for later analyses in R. (d is now in the "wide format", from which it is not difficult to translate into the standard "long format" via packages like reshape or its successor reshape2).

As a side bar, you could use cbind() (column bind) instead of data.frame, in which case you would now have a matrix instead of a data frame. For purely numerical values and simple summary statistics, this doesn't make a huge difference. I mention this only as a parallel to rbind() (see below) -- typically observations are stored in data frames instead of plain matrices (i.e. semantically richer storage).

Summarize and Combine

If you want to combine the summaries, you can use rbind() (row bind) to combine the summaries.

xs <- summary(x)
ys <- summary(y)

s <- rbind(xs,ys) 

print(s)

which will give you output like this:

      Min. 1st Qu. Median   Mean 3rd Qu.   Max.
xs 0.03026  0.2941 0.4852 0.5114  0.7135 0.9847
ys 0.01173  0.2497 0.4752 0.4787  0.6951 0.9803

From there, it should be easy enough to use the built-in functions for writing tabular data to file, see ?write.table. Excel can open both tab-separated and CSV files. If you want to go directly to PDF, then you need to take a look at exporting to LaTeX via the xtable package and/or using RMarkdown to generate a report. Printing tables with those systems is well documented elsewhere online.

Upvotes: 1

Related Questions