Using Python/Pandas to merge rows that have similar values but combine data that is different to a single row

Question

I've been digging through stack overflow trying to solve a problem, I come close ever time, but I don't get exactly what I need. (this is generic csv file, I made up for the example) something.csv

lastName, firstName, address, tool, description
Franks, James, 321 Hammond, hammer, "It hammers"
Franks, James, 321 Hammond, nails, "It Nails stuff"
Phiilips, Tom, 773 James St, mower, "It mows"
Phiilips, Tom, 773 James St, weed-wacker, "It whacks"}

I'm trying to merge the lines into a dictionary to where they read something like this

Franks: [(hammer, "It hammers"), (nails, "It Nails stuff")]
Phiilips: [(mower, "It mows"),  (weed-wacker, "It whacks")]

I'm wondering if this is even possible, or I'm just making things too hard...

This is what I've tried so far

df3 = pd.read_csv("results.csv", encoding="utf-8", skipinitialspace=True)
df3.groupby("lastname")[["tool","description"]].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()

The Results:

{Franks: [("hammer", "It hammers"), ("nails", "It Nails stuff")]}
{Franks: [("hammer", "It hammers"), ("nails", "It Nails stuff")]}
{Phiilips:[("mower", "It mows"), ("weed-wacker", "It whacks")]}
{Phiilips:[("mower", "It mows"), ("weed-wacker", "It whacks")]}

Not good enough yet to figure out why I'm getting duplicate lines, but something like this without the duplicate lines is what I am aiming for.

m13op22 · Accepted Answer

You can use the csv module and its DictReader.

import csv
from collections import defaultdict

dd = defaultdict(list)
with open('results.csv', 'r') as fin:
    reader = csv.DictReader(fin)
    for row in reader:
        dd[row['lastName']].append((row['tool'], row['description']))

Output:

defaultdict(list,
        {'Franks': [('hammer', 'It hammers'), ('nails', 'It Nails stuff')],
         'Phiilips': [('mower', 'It mows'), ('weed-wacker', 'It whacks')]})

Using Python/Pandas to merge rows that have similar values but combine data that is different to a single row

Answers (1)

Related Questions