Harry Spratt
Harry Spratt

Reputation: 305

How to amend specific csv files using pandas

At the moment I have 133 csv files and I want to try and combine them together into catagories according to certain variables such as percentage and substrates. I have been trying to get it so that if the substrate string and the percentage string are in the name and if so create a csv relating to those particular percentages and substrates. Below is the code

df_a = pd.DataFrame()

percentages = ['50', '60', '70', '80', '90', '100']
substrates = ['PS(3.0)', 'CA(1.5)', 'CA(3.0)', 'CA(4.5)',
              'BCP(3.0)', 'PVK(3.0)', 'PVP(2.0)']

for csv in files:
    for per in range(0, len(percentages)):
        percentage = percentages[per]
        for x in range(0, len(substrates)):
            substrate = substrates[x]
            # percentage = str(percentage)
            # substrate = str(substrate)
            name, ext = os.path.splitext(csv)
            if ext == '.csv':
                match = 'water(' + percentages[per] + '%)-' + substrates[x]  # if percentage in name and substrate in name:
                df = pd.DataFrame()
                if match in name:
                    print(match)
                    file_path = folder_path + csv
                    print(file_path)
                    data = np.genfromtxt(file_path, delimiter=',', skip_header=1)
                    data = np.reshape(data, (1, -1))

                    data_fit = data
                    df = pd.DataFrame(data_fit,
                                      columns=['Number', 'Number of droplets',
                                               'Substance', 'Percentage', 'Substrate',
                                               'Middle of droplet', 'Frame Rate',
                                               'Total time', 'Difference in Frames',
                                               'Initial height',
                                               'Exponential decay constant',
                                               'Angular frequency',
                                               'Frequency', 'Phi offset',
                                               'The amplitude', 'Last scanned at',
                                               ])
                    df_a = df_a.append(df, ignore_index=True)
                    print("Substrate: ", substrates[x], ", Percentage: ", percentages[per])
                    data_path = "/Users/harry/Desktop/Droplet Experiment/Analysis/"
                    df_a.to_csv(data_path + percentages[per] + '% - ' + substrates[x] + ' analysis.csv')

However at the moment it merging all of them and producing csv files which are 133 lines long instead of around 10. As for each given percentage and substrate there are 10 files with those given variables in them. Does anyone know what's wrong with my code. Any help is appreciated. The 'name' would look something like the pic attached. enter image description here

Upvotes: 0

Views: 90

Answers (1)

David M.
David M.

Reputation: 4588

The problem is that you are overwriting your .csv files at each iteration of your triple nested for loop; the files you obtain correspond to the .csv file of the last iteration.

One way to get one .csv output file per percentage/substrate combination is to add a percentage/substrate column to your main data frame df_a. Then after exiting all your loops you can print your .csv files by selecting from df_a each percentage/substrate combination in turn.

Without having access to your files it is somewhat difficult to provide a definitive solution, but it could look like something like this:

df_a = pd.DataFrame()

percentages = ['50', '60', '70', '80', '90', '100']
substrates = ['PS(3.0)', 'CA(1.5)', 'CA(3.0)', 'CA(4.5)',
              'BCP(3.0)', 'PVK(3.0)', 'PVP(2.0)']

for csv in files:
    for percentage in percentages:
        for substrate in substrates:
            name, ext = os.path.splitext(csv)
            if ext == '.csv':
                match = 'water(' + percentage + '%)-' + substrate  # if percentage in name and substrate in name:
                if match in name:
                    print(match)
                    file_path = folder_path + csv
                    print(file_path)
                    data = np.genfromtxt(file_path, delimiter=',', skip_header=1)
                    data = np.reshape(data, (1, -1))
                    data_fit = data

                    print("Substrate: ", substrate, ", Percentage: ", percentage)
                    combination = percentage + '% - ' + substrate
                    df_combination = pd.DataFrame({'combination': [combination] * data_fit.shape[0]})

                    df = pd.DataFrame(data_fit,
                                      columns=['Number', 'Number of droplets',
                                               'Substance', 'Percentage', 'Substrate',
                                               'Middle of droplet', 'Frame Rate',
                                               'Total time', 'Difference in Frames',
                                               'Initial height',
                                               'Exponential decay constant',
                                               'Angular frequency',
                                               'Frequency', 'Phi offset',
                                               'The amplitude', 'Last scanned at',
                                               ])
                    df = pd.concat([df_combination, df], axis=1)
                    df_a = df_a.append(df, ignore_index=True)

data_path = "/Users/harry/Desktop/Droplet Experiment/Analysis/"
for combination in df_a['combination'].unique():
    df_subset = df_a[df_a['combination'] == combination]
    df_subset.to_csv(data_path + combination + ' analysis.csv')

Upvotes: 1

Related Questions