Reputation: 2543
I have a pandas DataFrame object where each row represents one object in an image.
One example of a possible row would be:
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'}
I want to aggregate all the objects that belong to the same image, and get something whose rows would be like:
{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {{'obj_size':'6', 'obj_type':'bus'}}]}
That is, the third column is a list of columns containing the data of each group.
How can I do this?
EDIT:
Consider the following example.
import pandas as pd
df1 = pd.DataFrame([
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'5', 'obj_type':'car'},
{'img_filename': 'img1.txt', 'img_size':'20', 'obj_size':'6', 'obj_type':'bus'},
{'img_filename': 'img2.txt', 'img_size':'25', 'obj_size':'4', 'obj_type':'car'}
])
df2 = pd.DataFrame([
{'img_filename': 'img1.txt', 'img_size':'20', 'obj': [{'obj_size':'5', 'obj_type':'car'}, {'obj_size':'6', 'obj_type':'bus'}]},
{'img_filename': 'img2.txt', 'img_size':'25', 'obj': [{'obj_size':'4', 'obj_type':'car'}]}
])
I want to turn df1
into df2
.
Upvotes: 1
Views: 896
Reputation: 1824
One liner.
Suppose you have same img_filename
and different img_size
and you want to join
the value.
For ex:
img_filename img_size obj_size obj_type
0 img1.txt 20 5 car
1 img1.txt 22 6 bus
2 img2.txt 25 4 car
# if you want to join the img_size of img1.txt like 20, 22
df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": ','.join(x["img_size"])})).reset_index()
Output:
img_filename obj img_size
0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20,22
1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25
Considering first value
#if you want to consider only first value i.e. 20
df2 = df1.groupby("img_filename")["img_size", "obj_size", "obj_type"].apply(lambda x: pd.Series({"obj": x[["obj_size", "obj_type"]].to_json(orient="records"), "img_size": x["img_size"].iloc[0]})).reset_index()
Output:
img_filename obj img_size
0 img1.txt [{"obj_size":"5","obj_type":"car"},{"obj_size"... 20
1 img2.txt [{"obj_size":"4","obj_type":"car"}] 25
Upvotes: 1
Reputation: 4233
One way using to_dict
df2 = df1.groupby('img_filename')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))
df2 = df2.reset_index(name='obj')
# Assuming you have multiple same img files with different sizes then I'm choosing first.
# If this not the case then groupby directly and reset index.
#df1.groupby('img_filename, 'img_size')['obj_size','obj_type'].apply(lambda x: x.to_dict('records'))
df2['img_size'] = df1.groupby('img_filename')['img_size'].first().values
print (df2)
img_filename obj img_size
0 img1.txt [{'obj_size': '5', 'obj_type': 'car'}, {'obj_s... 20
1 img2.txt [{'obj_size': '4', 'obj_type': 'car'}] 25
Upvotes: 1