Reputation: 10937
I have a pandas dataframe with several columns. Bulk of the column names can be looped. So I have made an array of the column names like this:
ycols = ['{}_{}d pred'.format(ticker, i) for i in range(hm_days)]
Now I want to make a new pandas dataframe with only these columns having the index of the parent dataframe. How to do this?
Upvotes: 3
Views: 3139
Reputation: 3852
Ok, So you want to create a new dataframe with new column names, with the existing index of the original dataframe.
For some dataframe:
old_df = pd.DataFrame({'x':[0,1,2,3],'y':[10,9,8,7]})
>>>
x y
0 0 10
1 1 9
2 2 8
3 3 7
columns = list(old_df)
>>>
['x', 'y']
You can specify your new columns by doing:
y_cols = ['x_pred','y_pred']
>>> ['x_pred','y_pred']
Here, y_cols
is the list of your new column names. In your code, you would replace this step with ycols = ['{}_{}d pred'.format(ticker, i) for i in range(hm_days)]
.
To get the new columns, you create new columns with a placeholder variable (in this case 0
, as it looks like you are using numeric data), with the same index as your old dataframe:
# Iterate over all columns names in y_cols
for i in y_cols:
old_df[i]=0
>>> old_df:
x y x_pred y_pred
0 0 10 0 0
1 1 9 0 0
2 2 8 0 0
3 3 7 0 0
Finally, slice your dataframe to get your new dataframe with new column names, maintaining the index of the old dataframe.
df_new = old_df[y_cols]
>>>
x_pred y_pred
0 0 0
1 0 0
2 0 0
3 0 0
This works even if you have a named index
:
x y x_pred y_pred
Date
0 0 10 0 0
1 1 9 0 0
2 2 8 0 0
3 3 7 0 0
df_new = old_df[y_cols]
x_pred y_pred
Date
0 0 0
1 0 0
2 0 0
3 0 0
Upvotes: 1