BLuta
BLuta

Reputation: 247

Having trouble appending dataframes

I would like to access and edit individual dataframes after creating them by a for loop.

#Let's get those files!!!
bdc_files = {'nwhl': 'https://raw.githubusercontent.com/bigdatacup/Big-Data-Cup-2021/main/hackathon_nwhl.csv',
 'olympics': 'https://raw.githubusercontent.com/bigdatacup/Big-Data-Cup-2021/main/hackathon_womens.csv',
 'erie': 'https://raw.githubusercontent.com/bigdatacup/Big-Data-Cup-2021/main/hackathon_scouting.csv'}

df_list = []
for (a,b) in bdc_files.items():
 #Grab csv file
 c = pd.read_csv(b)
 c.name = a
 #a = a.append(c)

 #Manipuling the Data as we please
 c['Game_ID'] = c['game_date'] + c['Home Team'] + c['Away Team']
 c['Detail 3'] = c['Detail 3'].replace('t', 'with traffic')
 c['Detail 3'] = c['Detail 3'].replace('f', 'without traffic')
 c['Detail 4'] = c['Detail 4'].replace('t', 'one-timer')
 c['Detail 4'] = c['Detail 4'].replace('f', 'not one-timer')
 c['Details'] = c['Detail 1'].astype(str).add(' ').add(c['Detail 2'].astype(str)).add(' ').add(c['Detail 3'].astype(str)).add(' ').add(c['Detail 4'].astype(str))

 c['is_goal'] = 0
 c['is_shot'] = 0
 c.loc[c['Event'] == 'Shot', 'is_shot'] = 1
 c.loc[c['Event'] == 'Goal', 'is_goal'] = 1
 c['Goal Differential'] = c['Home Team Goals'] - c['Away Team Goals']
 c['Clock'] = pd.to_datetime(c['Clock'], format = '%M:%S')
 c['Seconds Remaining'] = ((c['Clock'].dt.minute)*60) + (c['Clock'].dt.second)
 df_list.append(a)

 #Printing Datasheet info
 Title = "The sample of games from the {}".format(c.name) 
 print(c.name)
 print(Title + " dataset is:", len(list(c['Game_ID'].value_counts())))
 print(c['Event'].value_counts())
 print(c.columns.values)
 print(c.loc[c['Event'] == 'Shot', 'Details'].value_counts())
 print(c.head())
 print(c.info())

print(df_list)
print(nwhl)

However, if I want to print the nwhl database, I get the following output...

Empty DataFrame
Columns: []
Index: []

And if I were to use an append, I would get this error

AttributeError: 'str' object has no attribute 'append'

Long story short, based off of the code I have, how can I be able to print and perform other tasks with the dataframes outside of the for loop? Any assistance is truly appreciated.

Upvotes: 1

Views: 28

Answers (1)

Scott Boston
Scott Boston

Reputation: 153480

Use a dictionary of dataframes, df_dict:

Add

df_dict = {}
...
for (a,b) in bdc_files.items():
 #Grab csv file
 c = pd.read_csv(b)
 c.name = a
 # Add this line to build dictionary
 df_dict[a] = c

And, at the end print

df_dict['nwhl']

Upvotes: 1

Related Questions