Luca91
Luca91

Reputation: 609

Read and append each nth row from csv files in a folder python

I have a folder containing 30 files, each of them containing thousands of rows. I would like to loop through the files, creating a dataframe containing each 10th row from each file. The resulting dataframe would contain rows 10, 20, 30, 40, etc. from the first file; rows 10, 20, 30, 40, etc. from the second file and so on.

For the moment I have:

all_files = glob.glob("DK_Frequency/*.csv")
li = []
for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

that appends in a list the different files from the folder. But I don't know how to go further.

Any idea? thank you in advance.

Upvotes: 1

Views: 785

Answers (3)

Serge Ballesta
Serge Ballesta

Reputation: 149075

Pandas read_csv allows to keep only every 10th line with skiprows. So you could use:

all_files = glob.glob("DK_Frequency/*.csv")
li = []
for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0, skiprows = lambda x: 0 != x%10)
    li.append(df)
global_df = pd.concat(li, ignore_index=True)

Upvotes: 1

Malyaj
Malyaj

Reputation: 23

Assuming that all the csv files have the same structure, you could do as follows:

# -*- coding: utf-8 -*-
all_files = glob.glob("DK_Frequency/*.csv")

#cols_to_take is the list of column headers
cols_to_take = pd.read_csv(all_files[0]).columns

## create an empty dataframe
big_df = pd.DataFrame(col_to_take)

for csv in all_files:
    df = pd.read_csv(csv)
    indices = list(filter(lambda x: x % 10 == 0, df.index))
    df = df.loc[indices].reset_index()

    ## append df to big_df
    big_df = big_df.append(df, ignore_index=True)

Upvotes: 0

razdi
razdi

Reputation: 1440

This will slice the df with every 10th row using iloc and then append it to the final-df. At the end of the loop, the final_df should contain all the necessary rows

all_files = glob.glob("DK_Frequency/*.csv")
li = []
final_df = pd.DataFrame()
for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    final_df.append(df.iloc[::10])

Upvotes: 2

Related Questions