Reputation: 609
I have a folder containing 30 files, each of them containing thousands of rows. I would like to loop through the files, creating a dataframe containing each 10th row from each file. The resulting dataframe would contain rows 10, 20, 30, 40, etc. from the first file; rows 10, 20, 30, 40, etc. from the second file and so on.
For the moment I have:
all_files = glob.glob("DK_Frequency/*.csv")
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
that appends in a list the different files from the folder. But I don't know how to go further.
Any idea? thank you in advance.
Upvotes: 1
Views: 785
Reputation: 149075
Pandas read_csv allows to keep only every 10th line with skiprows
. So you could use:
all_files = glob.glob("DK_Frequency/*.csv")
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0, skiprows = lambda x: 0 != x%10)
li.append(df)
global_df = pd.concat(li, ignore_index=True)
Upvotes: 1
Reputation: 23
Assuming that all the csv files have the same structure, you could do as follows:
# -*- coding: utf-8 -*-
all_files = glob.glob("DK_Frequency/*.csv")
#cols_to_take is the list of column headers
cols_to_take = pd.read_csv(all_files[0]).columns
## create an empty dataframe
big_df = pd.DataFrame(col_to_take)
for csv in all_files:
df = pd.read_csv(csv)
indices = list(filter(lambda x: x % 10 == 0, df.index))
df = df.loc[indices].reset_index()
## append df to big_df
big_df = big_df.append(df, ignore_index=True)
Upvotes: 0
Reputation: 1440
This will slice the df
with every 10th row using iloc
and then append it to the final-df
. At the end of the loop, the final_df
should contain all the necessary rows
all_files = glob.glob("DK_Frequency/*.csv")
li = []
final_df = pd.DataFrame()
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
final_df.append(df.iloc[::10])
Upvotes: 2