Reputation: 691
Suppose I have a dataframe like this:
Height Speed
0 4.0 39.0
1 7.8 24.0
2 8.9 80.5
3 4.2 60.0
Then, through some feature extraction, I get this:
0 39.0
1 24.0
2 80.5
3 60.0
However, I want it to be a dataframe where the column index is still there. How would you get the following?
Speed
0 39.0
1 24.0
2 80.5
3 60.0
I am looking for an answer that compares the original with the new column and determines that the new column must be named Speed. In other words, it shouldn't just rename the new column 'Speed'.
Here is the feature extraction: Let X be the original dataframe and X1 be the returned array that lacks a column name.
svc = SVC(kernel="linear")
rfecv = RFECV(estimator=svc, step=1, cv=StratifiedKFold(2),
scoring='accuracy')
X1=rfecv.fit_transform(X, y)
Thanks
EDIT:
For the comments I am receiving, I will clarify my ambiguity. I believe that the feature extraction method above takes a dataframe or a series/array. Then, it returns an array. I am passing a dataframe into it. This dataframe contains the column labels and the data. However, it returns an array that lacks column names. Another caveat is that this must be ambiguous in general. I cannot explicitly name my columns because the columns will change in my program. It could return two arrays, four arrays, ... I am looking for a method that will compare the original dataframe to the array(s) given after the feature extraction and realize that the new array is "subset" of the original dataframe. Then, mark it with the orginal column name(s). Let me know your thoughts on that! Sorry guys and thank you for your help.
Upvotes: 0
Views: 100
Reputation: 4348
RFECV, after being fit, has an attribute called support_
, which is a boolean mask of selected features. You can obtain the names of the chosen features by doing:
selected_cols = original_df.columns[rfecv.support_]
Easy peasey!
Upvotes: 1