Reputation: 21
I want to use python to determine if the first instance of an ID value in the "Id" column has a match on a later row in that same column. If it does, then I want to take the value from the "Avail" column for the rows which match that initial "Id" value. Then I want to delete the rows with the duplicate Ids.
Here's my sample data: I have a CSV file that has data like this:
Id,First,Last,Avail
abcdefg,John,Smith,4164667a-5dca-4ec6-a495-4be5b135d868=immediate
dgasgas,Nancy,Adams,f98a8fbd-fb88-49b9-894e-631ba2a6f369=immediate
gaytrjhu,John,Smith,e24ddf4c-c79f-4a84-a4ed-d92a10cc9e15=immediate
abcdefg,John,Smith,3ec0c158-8782-41ff-8388-5a10b9261b60=immediate
abcdefg,John,Smith,3ec0c158-8782-41ff-8388-c5dfe3b1276c=relative|7
Desired output (v1) (Please note that I don't care about the "First" or "Last" columns from the duplicate rows. I only care about the "Avail" data from those:
Id,First,Last,Avail
abcdefg,John,Smith,4164667a-5dca-4ec6-a495-4be5b135d868=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate
dgasgas,Nancy,Adams,f98a8fbd-fb88-49b9-894e-631ba2a6f369=immediate
gaytrjhu,John,Smith,e24ddf4c-c79f-4a84-a4ed-d92a10cc9e15=immediate
abcdefg,Nancy,Adams,3ec0c158-8782-41ff-8388-5a10b9261b60=immediate
abcdefg,John,Smith,3ec0c158-8782-41ff-8388-c5dfe3b1276c=relative|7
Then I'd like to delete the "duplicate" rows, leaving this:
Id,First,Last,Avail
abcdefg,John,Smith,4164667a-5dca-4ec6-a495-4be5b135d868=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate
dgasgas,Nancy,Adams,f98a8fbd-fb88-49b9-894e-631ba2a6f369=immediate
gaytrjhu,John,Smith,e24ddf4c-c79f-4a84-a4ed-d92a10cc9e15=immediate
Upvotes: 2
Views: 571
Reputation: 1836
import pandas as pd
df = pd.DataFrame(data=[
[1, 'John', 'Smith', 'a'],
[1, 'John', 'Smith', 'b'],
[2, 'Kate', 'Smith', 'c'],
],
columns=['ID', 'First', 'Last', 'Avail']
)
output = (df
.groupby(['ID', 'First', 'Last'], as_index=False)
.agg({'Avail': lambda x: ';'.join(x)}))
You can use groupby as @Sphinx suggested. An example with the style of output you requested is above.
Upvotes: 1