diamondsndrama
diamondsndrama

Reputation: 37

Selecting columns from CSV file

Python newbie here.

I have a bunch of CSV files of stock prices. They are structured as date, opening, high, low, close, adj. close, volume. I only need the date (column 1), the opening (column 2) and close (column 5).

The task prohibits the use of pandas. How can I extract those three columns and return them?

This is my attempt, but that doesn't extract the 3 columns I need.

def read_daily_prices(daily_price):
    daily_prices = []
    with open(stocks_symbols + ".csv", "r", newline='') as csvfile:
        csvreader = csv.DictReader(csvfile, delimiter=',')
            for daily_price in csvreader:
                daily_prices.append(daily_price)

Upvotes: 0

Views: 659

Answers (2)

Pivoshenko
Pivoshenko

Reputation: 359

Hi👋🏻 Hope you are doing well!

If I understood your question correctly, you can do something like that:

import csv

path = "stocks.csv"
# stocks.csv
# date, opening, high, low, close, adj_close, volume
# 2022-01-01, opening_1, high_1, low_1, close_1, adj_close_1, volume_1
# 2022-01-02, opening_2, high_2, low_2, close_2, adj_close_2, volume_2

columns = ["date", "opening", "close"]

daily_prices = []

with open(path, newline="") as csvfile:
    reader = csv.DictReader(csvfile, delimiter=",")
    for row in reader:
        daily_prices.append([row["date"], row["opening"], row["close"]])

print(daily_prices)
# [['2022-01-01', 'opening_1', 'close_1'], ['2022-01-02', 'opening_2', 'close_2']]

daily_prices_dicts = [dict(zip(columns, row)) for row in daily_prices]

print(daily_prices_dicts)
# [{'date': '2022-01-01', 'opening': 'opening_1', 'close': 'close_1'}, {'date': '2022-01-02', 'opening': 'opening_2', 'close': 'close_2'}]

the main idea is: csv.DictReader returns list of dicts so you can select necessary columns by key. 🙂

Upvotes: 1

Azhar Khan
Azhar Khan

Reputation: 4098

Not sure about your reasons to not use pandas; but if your activity involves processing a 2-D dataframe, then it will be much easier with pandas/numpy.

If you want to use plain python APIs, then you can use built-in csv and access columns with indexes. Extract the required columns, create a new sub-list and append to result list.

import csv

def read_daily_prices():
  daily_prices = []
  with open("/content/sample_data/test.csv", "r", newline="") as csvfile:
      csvreader = csv.reader(csvfile, delimiter=",")
      for daily_price in csvreader:
          daily_prices.append([daily_price[0], daily_price[1], daily_price[4]])
  return daily_prices
#

read_daily_prices()

Sample Input:

date,opening,high,low,close,adjclose,volume
2022-12-09,100,101,99,99.5,99.5,10000000
2022-12-09,200,202,199,199,199.1,20000000
2022-12-09,300,303,299,299,299.2,30000000

Generated Output:

[['date', 'opening', 'close'],
 ['2022-12-09', '100', '99.5'],
 ['2022-12-09', '200', '199'],
 ['2022-12-09', '300', '299']]

Upvotes: 1

Related Questions