kiwi_kimchi
kiwi_kimchi

Reputation: 423

Extracting a row value from one file, and putting that value to another row in another file (with filename corresponds to row of the previous file)

I have a list of CSV filenames (which is in another CSV file called CSV_file_1). However, I want to put additional two columns in CSV_file_1, in which the row values will come from the thousands of individual CSV files.

CSV_file_1 contents are as follows:

1.csv
2.csv
3.csv

In the thousands of files I have in another folder, it contains values that I want to put in CSV_file_1. Example, 1.csv contains the following rows:

LATITUDE : ;13.63345
LONGITUDE : ;123.207083  

2.csv contains the following rows:

LATITUDE : ;13.11111
LONGITUDE : ;123.22222 

3.csv contains the following rows:

LATITUDE : ;13.22222
LONGITUDE : ;123.11111 

and so on.

The result that I want to have for CSV_file_1 is as follows:

FILENAME:      LATITUDE:     LONGITUDE:
    1.csv      13.63345      123.207083
    2.csv      13.11111      123.22222
    3.csv      13.22222      123.11111

I already managed to have my CSV_file_1 but without the LATITUDE AND LONGITUDE yet (which will come from individual files which are delimited as shown above).

My code is like this:

import pandas as pd

import glob

print(glob.glob("D:/2021/*.csv")) 

#list of all the filenames collated and put in CSV_file_1
CSV_file_1 = pd.DataFrame(glob.glob("D:/2021/*.csv")) 


 #creating blank columns in CSV_file_1
CSV_file_1 ['Latitude'] = ""
CSV_file_1 ['Longitude'] = ""

#here im trying to access each file in the given folder(file name must correspond to the row in CSV_file_1), extract the data (latitude and longitude) and copy it to CSV_file_1
 import csv
 with open('D:/2021/*.csv','rt')as file:
      data = csv.reader(file)
      for row in file:
            if glob.glob("D:/2021/*.csv") = CSV_file_1['FILENAME']:
                CSV_file_1.iloc[i] ['LATITUDE:'] ==file.iloc[i]
        
        
        
    CSV_file_1.to_csv('D:/2021/CSV_file_1.csv', index = False)

but I get invalid syntax.

 if glob.glob("D:/2021/*.csv") = CSV_file_1['FILENAME']:
            ^
SyntaxError: invalid syntax

I am a python newbie so I would like to seek help to fix my code.

Upvotes: 1

Views: 1355

Answers (1)

JANO
JANO

Reputation: 3076

If I understand your problem correctly I think your approach is a little bit complex. I implemented a script that is creating the desired output.

First, the CSV file with the names of the other files is read directly into the first column of the data frame. Then, the file names are used to extract the longitude and latitude from each file. For this, I created a function, which you can see in the first part of the script. In the end, I add the extracted values to the data frame and store it in a file in the desired format.

import pandas as pd
import csv

# Function that takes 
def get_lati_and_long_from_csv(csv_path):
    with open(csv_path,'rt') as file:
        # Read csv file content to list of rows
        data = list(csv.reader(file, delimiter =';'))
        
        # Take values from row zero and one
        latitude = data[0][1]
        longitude = data[1][1]
      
        
        return (latitude, longitude)

def main():      
    # Define path of first csv file
    csv_file_1_path = "CSV_file_1.csv"

    # Read data frame from csv file and create correct column name
    CSV_file_1 = pd.read_csv(csv_file_1_path, header=None)
    CSV_file_1.columns = ['FILENAME:']
    
    # Create list of files to read the coordinates
    list_of_csvs = list(CSV_file_1['FILENAME:'])

    # Define empty lists to add the coordinates
    lat_list = []
    lon_list = []
    
    # Iterate over all csv files and extract longitude and latitude
    for csv_path in list_of_csvs:
        lat, lon = get_lati_and_long_from_csv(csv_path)
        lat_list.append(lat)
        lon_list.append(lon)
        
    # Add coordinates to the data frame
    CSV_file_1['Latitude:'] = lat_list
    CSV_file_1['Longitude:'] = lon_list
 
    # Save final data frame to csv file
    CSV_file_1.to_csv(csv_file_1_path+'.out', index = False, sep='\t')
    
if __name__ == "__main__":
    main()

Test input file content:

1.csv
2.csv
3.csv

Test output file content:

FILENAME:   Latitude:   Longitude:
1.csv   13.63345    123.207083  
2.csv   13.11111    123.22222 
3.csv   13.22222    123.11111

EDIT: If your files do not contain any other data, I would suggest simplifying things and removing pandas as it is not needed. The following main() function produces the same result but uses only the CSV module.

def main():      
    # Define path of first csv file
    csv_file_1_path = "CSV_file_1.csv"

    # Read file to list containing the paths of the other csv files
    with open(csv_file_1_path,'rt') as file:
        list_of_csvs = file.read().splitlines()
        
    print(list_of_csvs)
    # Define empty lists to add the coordinates
    lat_list = []
    lon_list = []
    
    # Iterate over all csv files and extract longitude and latitude
    for csv_path in list_of_csvs:
        lat, lon = get_lati_and_long_from_csv(csv_path)
        lat_list.append(lat)
        lon_list.append(lon)
    
    # Combine the three different lists to create the rows of the new csv file
    data = list(zip(list_of_csvs, lat_list, lon_list))
    
    # Create the headers and combine them with the other rows
    rows = [['FILENAME:', 'Latitude:', 'Longitude:']]
    rows.extend(data)
    
    # Write everything to the final csv file
    with open(csv_file_1_path + '.out','w') as file:
        csv_writer = csv.writer(file, dialect='excel', delimiter='\t')
        csv_writer.writerows(rows)

Upvotes: 1

Related Questions