Reputation: 670
I have a dataframe df
df = pd.DataFrame({'A':['-a',1,'a'],
'B':['a',np.nan,'c'],
'ID':[1,2,2],
't':[pd.tslib.Timestamp.now(),pd.tslib.Timestamp.now(),
np.nan]})
Added a new column
df['YearMonth'] = df['t'].map(lambda x: 100*x.year + x.month)
Now I want to write a function or macro which will do date comparasion, create a new dataframe also add a new column to dataframe.
I tried like this but seems I am going wrong:
def test(df,ym):
df_new=df
if(ym <= df['YearMonth']):
df_new+"_"+ym=df_new
return df_new+"_"+ym
df_new+"_"+ym['new_col']=ym
Now when I call test function I want a new dataframe should get created named as df_new_201612
and this new dataframe should have one more column, named as new_col
that has value of ym
for all the rows.
test(df,201612)
The output of new dataframe is:
df_new_201612
A B ID t YearMonth new_col
-a a 1 2016-12-05 12:37:56.374620 201612 201612
1 NaN 2 2016-12-05 12:37:56.374644 201208 201612
a c 2 nat nan 201612
Upvotes: 9
Views: 59681
Reputation: 21401
There is a more easy way to accomplish this using exec
method. The following steps can be done to create a dataframe at runtime.
1.Create the source dataframe with some random values.
import numpy as np
import pandas as pd
df = pd.DataFrame({'A':['-a',1,'a'],
'B':['a',np.nan,'c'],
'ID':[1,2,2]})
2.Assign a variable that holds the new dataframe name. You can even send this value as a parameter or loop it dynamically.
new_df_name = 'df_201612'
3.Create dataframe dynamically using exec
method to copy data from source dataframe to the new dataframe dynamically and in the next line assign a value to new column.
exec(f'{new_df_name} = df.copy()')
exec(f'{new_df_name}["new_col"] = 123')
4.Now the dataframe df_201612
will be available on the memory and you can execute print
statement along with eval
to verify this.
print(eval(new_df_name))
Upvotes: 1
Reputation: 7506
Creating variables with dynamic names is typically a bad practice.
I think the best solution for your problem is to store your dataframes into a dictionary and dynamically generate the name of the key to access each dataframe.
import copy
dict_of_df = {}
for ym in [201511, 201612, 201710]:
key_name = 'df_new_'+str(ym)
dict_of_df[key_name] = copy.deepcopy(df)
to_change = df['YearMonth']< ym
dict_of_df[key_name].loc[to_change, 'new_col'] = ym
dict_of_df.keys()
Out[36]: ['df_new_201710', 'df_new_201612', 'df_new_201511']
dict_of_df
Out[37]:
{'df_new_201511': A B ID t YearMonth new_col
0 -a a 1 2016-12-05 07:53:35.943 201612 201612
1 1 NaN 2 2016-12-05 07:53:35.943 201612 201612
2 a c 2 2016-12-05 07:53:35.943 201612 201612,
'df_new_201612': A B ID t YearMonth new_col
0 -a a 1 2016-12-05 07:53:35.943 201612 201612
1 1 NaN 2 2016-12-05 07:53:35.943 201612 201612
2 a c 2 2016-12-05 07:53:35.943 201612 201612,
'df_new_201710': A B ID t YearMonth new_col
0 -a a 1 2016-12-05 07:53:35.943 201612 201710
1 1 NaN 2 2016-12-05 07:53:35.943 201612 201710
2 a c 2 2016-12-05 07:53:35.943 201612 201710}
# Extract a single dataframe
df_2015 = dict_of_df['df_new_201511']
Upvotes: 24