user5426405
user5426405

Reputation: 21

Joining CSV fields

I need to convert

Name | Org
a    | 5
a    | 6
b    | 5
c    | 7

into

Name | Org
a    | 5,6
b    |  5
c    |  7

my first attempt was with this code

    while i < len(nameColumn):
    if nameColumn[i] not in resultC1:
        resultC1.append(nameColumn[i])
        while l < len(nameColumn):
            if nameColumn[l] == nameColumn[i]:
                tempdata += organizationColumn[l] + ','
            l += 1
        resultC2.append(tempdata[:-1])
        tempdata = ''
        k += 1
    i += 1

which just ends up with the result

Name | Org
a    |
b    |
c    |

Any help would be greatly appreciated. I havent had luck finding anything on this yet. I'm reading the data from a .CSV file into lists and working with that data and storing results into resultC1 and resultC2

Upvotes: 0

Views: 54

Answers (5)

Open AI - Opting Out
Open AI - Opting Out

Reputation: 24153

Use OrderedDict calling setdefault with an empty list, and the csv module:

import csv
from collections import OrderedDict

organizations = OrderedDict()
with open(filename) as infile:
    for name, org in csv.reader(infile, delimiter='|'):
        organizations.setdefault(name, []).append(org)

Then you can write the dictionary:

with open(filename, 'w') as outfile:
    writer = csv.writer(outfile, delimiter='|')
    for name, orgs in organizations.items():
        writer.writerow([name, ','.join(orgs)])

Upvotes: 0

JRG
JRG

Reputation: 4187

Here is another solution that can be generic to have delimiter for input and output file.

def parseData(fileName, delimiter):
    dictionary={}
    with open(fileName, 'r') as iFile:
        for line in iFile.readlines():
            row = line.split(delimiter)
            values = []
            if (row[0] in dictionary.keys()):
                values = dictionary[row[0]]
                values.append(row[1].replace('\n',''))
            else:
                values.append(row[1].replace('\n',''))
                dictionary[row[0]] = values
        dictionary[row[0]] = values
    ## print for debugging purpose
    print(dictionary)
    return dictionary

def writeData(fileName, odelimiter, idelimiter, dictionary):
    with open(fileName, 'w') as oFile:
        for key, values in dictionary.items():
            data=""
            for value in values:
                data = data + value + idelimiter
            data=data[:-1]
            ## print for debugging purpose
            print(key, data)
            oFile.write(key + odelimiter + data + "\n")

## main
dictionary=parseData('inputPipe.txt', "|")
writeData('output.txt', "|", ",", dictionary)

inputPipe.txt

a|5
a|6
b|5
c|7

output.txt

a|5,6
b|5
c|7

Sample Run

{'a': ['5', '6'], 'b': ['5'], 'c': ['7']}
a 5,6
b 5
c 7

Upvotes: 0

RomanPerekhrest
RomanPerekhrest

Reputation: 92874

The solution using itertools.groupby() function:

import csv, itertools

with open('yourfile.csv', 'r') as f:
    reader = csv.reader(f, delimiter='|', skipinitialspace=True)
    head = next(reader)   # header line
    items = [list(g) for k,g in itertools.groupby(sorted(reader), key=lambda x: x[0])]

    fmt = '{0[0]:<5} | {0[1]:^5}'  # format spec
    print(fmt.format(head))
    for item in items:
        print(fmt.format([item[0][0], ','.join(i[1] for i in item)] if len(item) > 1 else item[0]))

The output:

Name  |  Org 
a     |  5,6 
b     |   5  
c     |   7  

Upvotes: 0

cs95
cs95

Reputation: 402844

Here's a solution using collections.OrderedDict:

import csv
from collections import OrderedDict

data = OrderedDict()
with open('test.csv') as f:
    reader = csv.reader(f)
    for i, line in enumerate(reader):
        if i == 0:
            continue

        if line[0] not in data:
            data[line[0]] = []

        data[line[0]].append(line[1])

for k, v in data.items():
    print(k, '|', ', '.join(v))

An OrderedDict retains its order. The keys are the Names, and the values are lists of all Orgs associated with it.

Output:

a | 5, 6
b | 5
c | 7

If your csv has different delimiters from comma, then you'll have to specify that delimiter. I've assumed commas in my example.


Here's a much simpler solution with pandas:

In [443]: df.head()
Out[443]: 
  Name  Org
0    a    5
1    a    6
2    b    5
3    c    7

In [445]: for k, v in df.groupby('Name').apply(lambda x: list(x['Org'])).iteritems():
     ...:     print(k, '|', ', '.join(map(str, v)))
     ...:        
a | 5, 6
b | 5
c | 7

Upvotes: 1

Turn
Turn

Reputation: 7030

Assuming you're starting with the two arrays implied in your sample code, I'd go with something like this:

from collections import defaultdict 

nameColumn = ['a', 'a', 'b', 'c']
organizationColumn = ["5", "6", "5", "7"]

merged = defaultdict(list)
for name, org in zip(nameColumn, organizationColumn):
    merged[name].append(org)

for k, v in merged.items():
    print(f'{k} | {v}'))

Upvotes: 0

Related Questions