Reputation: 65951
As an example, I want a function that will iterate over the columns in a dataframe and print out each column's data type (e.g., "numeric", "integer", "character", etc)
Without a variable I know I can do class(df$MyColumn)
and get the data type. How can I change it so "MyColumn" is a variable?
What I'm trying is
f <- function(df) {
for(column in names(df)) {
columnClass = class(df[column])
print(columnClass)
}
}
But this just prints out [1] "data.frame"
for each column.
Upvotes: 5
Views: 7315
Reputation: 9047
Use a comma before column
:
for(column in names(df)) {
columnClass = class(df[,column])
print(columnClass)
}
Upvotes: 4
Reputation: 13443
You can use the colwise function of the plyr package to transform any function into a column wise function. This is a wrapper for lapply.
library(plyr)
colwise.print.class<-colwise(.fun=function(col) {print(class(col))})
colwise.print.class(df)
You can view the function created with
print(colwise.print.class)
Upvotes: 0
Reputation: 4939
Much as DWin suggested
apply(df,2,class)
but you say you want to do more with each coloumn? What do you want to do. Try to avoid abstract examples. In case it helps
apply(df,2,mean)
apply(df,2,sd)
or something more complicated
apply(df,2,function(x){s = c(summary(x)["Mean"], summary(x)["Median"], sd(x))})
Note that the summary function gives you most of this functionality anyway, but this is just an example. any function can be place inside of an apply and iterated over the cols of a matrix or dataframe. that function can be as complex or as simple as you need it to be.
Upvotes: 1
Reputation: 263331
Since a data frame is simply a list, you can loop over the columns using lapply
and apply the class
function to each column:
lapply(df, class)
To address the previously unspoken concerns in User's comment.... if you build a function that does whatever it is that you hope to a column, then this will succeed:
func <- function(col) {print(class(col))}
lapply(df, func)
It's really mostly equivalent to:
for(col in names(df) ) { print(class(df[[col]]))}
And there would not be an unneeded 'colClass' variable cluttering up the .GlobalEnv.
Upvotes: 7