Reputation: 11
I'm using R Studio Version 0.98.1062 on Mac(OS X Yosemite 10.10.1). I want to create a table (preferably to transfer it to excel or pdf format) from the data for several summary statistics describing the proportion of women enrolled in different disciplines:
summary(agriculture$X2009.PROP)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.3333 0.4881 0.4689 0.6026 1.0000
summary(economics$X2009.PROP)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.0000 0.2555 0.3161 0.3218 0.3887 0.6923 29
summary(education$X2009.PROP)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.0000 0.2967 0.5000 0.5490 0.8571 1.0000 46
summary(law$X2009.PROP)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.0000 0.4250 0.5695 0.5324 0.6593 1.0000 28
Basically I want a table to look like this:
Discipline/SS Min.1st Qu. Median Mean 3rd Qu. Max.
agriculture 0.0000 0.3333 0.4881 0.4689 0.6026 1.0000
economics 0.0000 0.2555 0.3161 0.3218 0.6923 29
education ....
law ....
Will you be so kind to advise me how to write the code for that?
Upvotes: 0
Views: 5658
Reputation: 3388
There are two basic ways you can do this: combining the data beforehand or afterwards.
Some sample data, randomly taken from the uniform distribution:
x <- runif(100)
y <- runif(100)
If you want to combine the data beforehand, then you need to use data.frame()
:
d <- data.frame(variable1=x,variable2=y)
summary(d)
which will give you output like:
variable1 variable2
Min. :0.03026 Min. :0.01173
1st Qu.:0.29410 1st Qu.:0.24968
Median :0.48517 Median :0.47524
Mean :0.51137 Mean :0.47865
3rd Qu.:0.71354 3rd Qu.:0.69512
Max. :0.98465 Max. :0.980
(Note that you can also do data.frame()
without specifying column names, in which case the names of the variables will be used as column names.) This might take some work to wrangle it into the format you want, but it would probably be the better format for later analyses in R. (d
is now in the "wide format", from which it is not difficult to translate into the standard "long format" via packages like reshape
or its successor reshape2
).
As a side bar, you could use cbind()
(column bind) instead of data.frame
, in which case you would now have a matrix instead of a data frame. For purely numerical values and simple summary statistics, this doesn't make a huge difference. I mention this only as a parallel to rbind()
(see below) -- typically observations are stored in data frames instead of plain matrices (i.e. semantically richer storage).
If you want to combine the summaries, you can use rbind()
(row bind) to combine the summaries.
xs <- summary(x)
ys <- summary(y)
s <- rbind(xs,ys)
print(s)
which will give you output like this:
Min. 1st Qu. Median Mean 3rd Qu. Max.
xs 0.03026 0.2941 0.4852 0.5114 0.7135 0.9847
ys 0.01173 0.2497 0.4752 0.4787 0.6951 0.9803
From there, it should be easy enough to use the built-in functions for writing tabular data to file, see ?write.table
. Excel can open both tab-separated and CSV files. If you want to go directly to PDF, then you need to take a look at exporting to LaTeX via the xtable
package and/or using RMarkdown to generate a report. Printing tables with those systems is well documented elsewhere online.
Upvotes: 1