Reputation: 119
I don't see why this code isn't working? I am trying to iterate over a data frame, which in this case only has one row in a for loop? There are only two columns and I have two for loop variables to take them? what am I missing please?
print("process_list = ",process_list)
for row in process_list.itertuples():
print("row = ", row)
df_to_date = pd.DataFrame()
try:
print("process_list = {} and it's type {} process_list.itertuples() {} ".format(process_list, type(process_list),process_list.itertuples() ) )
for file_date , file_name in process_list.itertuples(): # a whole batch of days
file_to_process = dev_env + file_name
print("PROCESSING BATCH: ",file_to_process)
df = pd.read_csv(file_to_process, header=None,skiprows=22, sep=',', comment='*', converters = {"Days" : just_number,"Percentile" : just_number,"Date" : just_number} ,names = column_names )
df.insert(0,'File_date',file_date)
df_to_date = df_to_date.append(df)
except Exception as e:
print ("nothing to process exception = ",e)
sys.exit(0)
when I run it I get
process_list = File_date File_name
94 20180507 mcmhv20180507.csv
row = Pandas(Index=94, File_date=20180507, File_name='mcmhv20180507.csv')
process_list = File_date File_name
94 20180507 mcmhv20180507.csv and it's type <class 'pandas.core.frame.DataFrame'> process_list.itertuples() <map object at 0x7f6339371e48>
nothing to process exception = too many values to unpack (expected 2)
Upvotes: 3
Views: 13098
Reputation: 164773
pd.DataFrame.itertuples
returns an iterable of namedtuples including the index by default.
There are two options to account for this.
Option 1
Unpack 3 items instead of 2, the first of which you do not use.
Here is a minimal example:
df = pd.DataFrame([[10, 20], [30, 40], [50, 60]],
columns=['A', 'B'])
for idx, a, b in df.itertuples():
print(idx, a, b)
0 10 20
1 30 40
2 50 60
In your case, a good convention to use would be to indicate an unused variable by _
:
for _, file_date, file_name in process_list[['date', 'name']].itertuples():
# do something
Option 2
Use index=False
argument and unpack 2 elements:
for file_date, file_name in process_list[['date', 'name']].itertuples(index=False):
# do something
The behaviour is indicated in the documentation:
DataFrame.itertuples(index=True, name='Pandas')
Iterate over DataFrame rows as namedtuples, with index value as first element of the tuple.
Upvotes: 5