Marwan A. Sallam
Marwan A. Sallam

Reputation: 15

Searching for specific text in csv(excel format) file

CVS Sample

So I have a csv file(sample in link above) , with variable names in row 7 and values in row 8 . The Variable all have units after them, and the values are just numbers like this :

Velocity (ft/s) Volumetric (Mgal/d  Mass Flow (klb/d)   Sound Speed (ft/s)                  
.-0l.121            1.232                  1.4533434          1.233423

There are alot more variables, but basically I need some way to search in the csv file for the specefic unit groups, and then append the value associated with that in a list. For example search for text "(ft/s)", and then make a dictionary with Velocity and Sound speed as Keys, and their associated values . I am unable to do this because the csv is formatted like an excel spreadsheet, and the cells contains the whole variable name with it's unit

In the end I will have a dictionary for each unit group, and I need to do it this way because each csv file generated, the unit groups change ( ft/s becomes m/s). I also can't use excel read, because it doesn't work in IronPython.

Upvotes: 0

Views: 1298

Answers (2)

taras
taras

Reputation: 6914

You can use csv module to read the appropriate lines into lists. defaultdict is a good choice for data aggregation, while variable names and units can be easily separated by splitting on '('.

import csv
import collections 

with open(csv_file_name) as fp:
    reader = csv.feader(fp)
    for k in range(6):  # skip 6 lines
        next(reader)
    varnames = next(reader)  # 7th line
    values = next(reader)    # 8th line

groups = collections.defaultdict(dict)
for i, (col, value) in enumerate(zip(varnames, values)):
    if i < 2:
        continue
    name, units = map(str.strip, col.strip(')').split('(', 1))
    groups[units][name] = float(value)

Edit: added the code to skip first two columns

Upvotes: 1

Mark Xavier
Mark Xavier

Reputation: 46

I'll help with the part I think you're stuck on, which is trying to extract the units from the category. Given your data, your best bet may be to use regex, the following should work:

import re
f = open('data.csv')
# I assume the first row has the header you listed in your question
header = f.readline().split(',') #since you said its a csv

for item in header:
    print re.search(r'\(.+\)', item).group()
    print re.sub(r'\(.+\)', '', item)

That should print the following for you:

(ft/s)
Velocity
(Mgal/d)
Volumetric
(klb/d)
Mass Flow
(ft/s)
Sound Speed

You can modify the above to store these in a list, then iterate through them to find duplicates and merge the appropriate strings to dictionaries or whatnot.

Upvotes: 0

Related Questions