Tan Jian Sean
Tan Jian Sean

Reputation: 35

getting column name using iloc in dataframe

Is there a way to get the column name as a value using iloc or other functions?

i have a for loop here:

for i in range(0,18):
    coef, pval = pearsonr(x.iloc[:,i],y)
    print('pval of ',x.iloc[?,i], ' and allStar: ', pval)

where i want to print 'pval of column_name and allStar: pval'

is there a value I can replace ? with so that it fetches the column name for each of the columns? Or I have to use another function?

Upvotes: 0

Views: 2850

Answers (2)

Yaniv
Yaniv

Reputation: 839

The short answer for your direct question is to use x.columns.

for i in range(0,18):
    coef, pval = pearsonr(x.iloc[:,i],y)
    print('pval of ',x.columns[i], ' and allStar: ', pval)

A cleaner approach would be to simply iterate over the columns:

for c in x.columns:
    coef, pval = pearsonr(x[c], y)
    print('pval of ',c, ' and allStar: ', pval)

Bonus notes (mainly to avoid the loop...):

  • To get the correlation coefficients (and not the pvalues just yet) of each column with y, you can simply use corrwith:
r = x.corrwith(pd.Series(y), axis=0)
  • To obtain the pvalues that correspond to those Pearson coefficients, you can simply calculate them directly, as follows:
dist = scipy.stats.beta(n/2 - 1, n/2 - 1, loc=-1, scale=2)  # n == len(y)
p = 2*dist.cdf(-abs(r))  # <= the pvalues!

Upvotes: 1

ipj
ipj

Reputation: 3598

If x is Your dataframe try converting column name to column index:

col_idx = x.columns.get_loc('column_name')

Now this index can be passed to iloc method.

Upvotes: 2

Related Questions