Rebuilding Column Names in Pandas Dataframe

Question

Suppose I have a dataframe like this:

   Height  Speed
0     4.0   39.0
1     7.8   24.0
2     8.9   80.5
3     4.2   60.0

Then, through some feature extraction, I get this:

However, I want it to be a dataframe where the column index is still there. How would you get the following?

   Speed
0   39.0
1   24.0
2   80.5
3   60.0

I am looking for an answer that compares the original with the new column and determines that the new column must be named Speed. In other words, it shouldn't just rename the new column 'Speed'.

Here is the feature extraction: Let X be the original dataframe and X1 be the returned array that lacks a column name.

    svc = SVC(kernel="linear")
    rfecv = RFECV(estimator=svc, step=1, cv=StratifiedKFold(2),
                  scoring='accuracy')
    X1=rfecv.fit_transform(X, y)

Thanks

EDIT:

For the comments I am receiving, I will clarify my ambiguity. I believe that the feature extraction method above takes a dataframe or a series/array. Then, it returns an array. I am passing a dataframe into it. This dataframe contains the column labels and the data. However, it returns an array that lacks column names. Another caveat is that this must be ambiguous in general. I cannot explicitly name my columns because the columns will change in my program. It could return two arrays, four arrays, ... I am looking for a method that will compare the original dataframe to the array(s) given after the feature extraction and realize that the new array is "subset" of the original dataframe. Then, mark it with the orginal column name(s). Let me know your thoughts on that! Sorry guys and thank you for your help.

Michele Tonutti · Accepted Answer

RFECV, after being fit, has an attribute called support_, which is a boolean mask of selected features. You can obtain the names of the chosen features by doing:

selected_cols = original_df.columns[rfecv.support_]

Easy peasey!

Rebuilding Column Names in Pandas Dataframe

Answers (1)

Related Questions