Reputation: 101
I am creating small dataframes from a larger dataframe. From the larger I am grabbing columns that contain a certain string lets say 'aa'. Now in the smaller df I want to create a new column for each of those. So for each 'aa' col, I want to add '_goal' so aa2, aa7, create aa2_goal, aa7_goal for scoring, and it has to be non specific since this can apply to many smaller df's with many different column names -but they all contain a certain 'str'.
df before--
name area aa2 ab1 aa7 ac3 time type
CAN 11 0.5 1.2 0.4 2.1 7:21 H
SPA 22 0.4 1.4 0.5 2.5 6:45 M
USP 21 0.7 1.1 0.6 2.5 3:14 G
COM 13 0.1 1.9 0.2 2.2 8:22 D
MAP 16 0.3 1.8 0.1 2.4 3:11 S
df after
name area aa2 ab1 aa7 ac3 time type aa2_new aa7_new
CAN 11 0.5 1.2 0.4 2.1 7:21 H
SPA 22 0.4 1.4 0.5 2.5 6:45 M
USP 21 0.7 1.1 0.6 2.5 3:14 G
COM 13 0.1 1.9 0.2 2.2 8:22 D
MAP 16 0.3 1.8 0.1 2.4 3:11 S
--my attempt
for col in df:
if 'aa' in df.columns:
df[col+'_new']
print df
--then the next step will be to import a value into these _goal columns from a different df as well --thanks
Upvotes: 3
Views: 70
Reputation: 164623
You can avoid explicit for
loops by filtering for the necessary columns and then using pd.DataFrame.join
to join an empty dataframe:
new_cols = df.columns[df.columns.str.startswith('aa')] + '_new'
df = df.join(pd.DataFrame(columns=new_cols))
print(df)
name area aa2 ab1 aa7 ac3 time type aa2_new aa7_new
0 CAN 11 0.5 1.2 0.4 2.1 7:21 H NaN NaN
1 SPA 22 0.4 1.4 0.5 2.5 6:45 M NaN NaN
2 USP 21 0.7 1.1 0.6 2.5 3:14 G NaN NaN
3 COM 13 0.1 1.9 0.2 2.2 8:22 D NaN NaN
4 MAP 16 0.3 1.8 0.1 2.4 3:11 S NaN NaN
The problem with your code is you do not assign a value to your series, and this is what tells pandas
to create a new column.
Your subsequent question should be asked separately, if it hasn't already been answered elsewhere.
Upvotes: 2
Reputation: 29635
to answer on the creation of columns depending on if they contain a sub string like 'aa', you can do:
for col in df.columns: # iterate over columns' names
if 'aa' in col:
df[col+'_goal'] = None # fill the column with None
# or df[col+'_goal'] = '' if you want empty string in the column you create
For what you call the next step, it's too broad to give an anwser, you can do something like df['aa2_goal'] =another_df['another_col']
Upvotes: 0