MSFTGirl
MSFTGirl

Reputation: 25

Reading rows in CSV file and appending a list creates a list of lists for each value

I am copying list output data from a DataCamp course so I can recreate the exercise in Visual Studio Code or Jupyter Notebook. From DataCamp Python Interactive window, I type the name of the list, highlight the output and paste it into a new file in VSCode. I use find and replace to delete all the commas and spaces and now have 142 numeric values, and I Save As life_exp.csv. Looks like this:

    43.828
    76.423
    72.301
    42.731
    75.32
    81.235
    79.829
    75.635
    64.062
    79.441

When I read the file into VSCode using either Pandas read_csv or csv.reader and use values.tolist() with Pandas or a for loop to append an existing, blank list, both cases provide me with a list of lists which then does not display the data correctly when I try to create matplotlib histograms.

I used NotePad to save the data as well as a .csv and both ways of saving the data produce the same issue.

import matplotlib.pyplot as plt 
import csv
life_exp = []

with open ('C:\data\life_exp.csv', 'rt') as life_expcsv:
    exp_read = csv.reader(life_expcsv, delimiter = '\n')
    for row in exp_read:
      life_exp.append(row)

And

import pandas as pd 
life_exp_df = pd.read_csv('c:\\data\\life_exp.csv', header = None)
life_exp = life_exp_df.values.tolist()

When you print life_exp after importing using csv, you get:

[['43.828'],
 ['76.423'],
 ['72.301'],
 ['42.731'],
 ['75.32'],
 ['81.235'],
 ['79.829'],
 ['75.635'],
 ['64.062'],
 ['79.441'],
 ['56.728'],
….

And when you print life_exp after importing using pandas read_csv, you get the same thing, but at least now it's not a string:

[[43.828],
 [76.423],
 [72.301],
 [42.731],
 [75.32],
 [81.235],
 [79.829],
 [75.635],
 [64.062],
 [79.441],
 [56.728],
…

and when you call plt.hist(life_exp) on either version of the list, you get each value as bin of 1.

I just want to read each value in the csv file and put each value into a simple Python list.

I have spent days scouring stackoverflow thinking someone has done this, but I can't seem to find an answer. I am very new to Python, so your help greatly appreciated.

Upvotes: 0

Views: 558

Answers (2)

M-M
M-M

Reputation: 450

Try:

import pandas as pd 
life_exp_df = pd.read_csv('c:\\data\\life_exp.csv', header = None)

# Select the values of your first column as a list
life_exp = life_exp_df.iloc[:, 0].tolist()

instead of:

life_exp = life_exp_df.values.tolist()

Upvotes: 1

Adrian
Adrian

Reputation: 177

With csv reader, it will parse the line into a list using the delimiter you provide. In this case, you provide \n as the delimiter but it will still take that single item and return it as a list.

When you append each row, you are essentially appending that list to another list. The simplest work-around is to index into row to extract that value

with open ('C:\data\life_exp.csv', 'rt') as life_expcsv:
    exp_read = csv.reader(life_expcsv, delimiter = '\n')
    for row in exp_read:
        life_exp.append(row[0])

However, if your data is not guaranteed to be formatted the way you have provided, you will need to handle that a bit differently:

with open ('C:\data\life_exp.csv', 'rt') as life_expcsv:
    exp_read = csv.reader(life_expcsv, delimiter = '\n')
    for row in exp_read:
        for number in row:
            life_exp.append(number)

A bit cleaner with list comprehension:

with open ('C:\data\life_exp.csv', 'rt') as life_expcsv:
    exp_read = csv.reader(life_expcsv, delimiter = '\n')
    [life_exp.append(number) for row in exp_read for number in row]

Upvotes: 0

Related Questions