User
User

Reputation: 65951

How to index data frame column by a variable?

As an example, I want a function that will iterate over the columns in a dataframe and print out each column's data type (e.g., "numeric", "integer", "character", etc)

Without a variable I know I can do class(df$MyColumn) and get the data type. How can I change it so "MyColumn" is a variable?

What I'm trying is

f <- function(df) {

 for(column in names(df)) {
   columnClass = class(df[column])
   print(columnClass)
 }

}

But this just prints out [1] "data.frame" for each column.

Upvotes: 5

Views: 7315

Answers (4)

smu
smu

Reputation: 9047

Use a comma before column:

for(column in names(df)) {
   columnClass = class(df[,column])
   print(columnClass)
 }

Upvotes: 4

Etienne Low-D&#233;carie
Etienne Low-D&#233;carie

Reputation: 13443

You can use the colwise function of the plyr package to transform any function into a column wise function. This is a wrapper for lapply.

library(plyr)

colwise.print.class<-colwise(.fun=function(col) {print(class(col))})

colwise.print.class(df)

You can view the function created with

print(colwise.print.class)

Upvotes: 0

Davy Kavanagh
Davy Kavanagh

Reputation: 4939

Much as DWin suggested

apply(df,2,class)

but you say you want to do more with each coloumn? What do you want to do. Try to avoid abstract examples. In case it helps

apply(df,2,mean)
apply(df,2,sd)

or something more complicated

apply(df,2,function(x){s = c(summary(x)["Mean"], summary(x)["Median"], sd(x))})

Note that the summary function gives you most of this functionality anyway, but this is just an example. any function can be place inside of an apply and iterated over the cols of a matrix or dataframe. that function can be as complex or as simple as you need it to be.

Upvotes: 1

IRTFM
IRTFM

Reputation: 263331

Since a data frame is simply a list, you can loop over the columns using lapply and apply the class function to each column:

lapply(df, class)

To address the previously unspoken concerns in User's comment.... if you build a function that does whatever it is that you hope to a column, then this will succeed:

func <- function(col) {print(class(col))}
lapply(df, func)

It's really mostly equivalent to:

 for(col in names(df) ) { print(class(df[[col]]))} 

And there would not be an unneeded 'colClass' variable cluttering up the .GlobalEnv.

Upvotes: 7

Related Questions