Veejay
Veejay

Reputation: 565

Extracting parts of array elements using python

I am working to extract all integer values from a specific column (left, top, length and width) in a csv file with multiple rows and columns. I have used pandas to isolate the columns I am interested in but Im stuck on how to use a specific parts of an array.

Let me explain: I need to use the CSV file's column with "left, top, length and width" attributes to then obtain xmin, ymin, xmax and ymax (these are coordinated of boxes in images). Example of a row in this column looks like so:

[{"left":171,"top":0,"width":163,"height":137,"label":"styrofoam container"},{"left":222,"top":42,"width":45,"height":70,"label":"chopstick"}]

And I need to extract the 171, 0, 163 and 137 to do the necessary operations for finding my xmax, xmin, ymax and ymin

The above line is a single row in my pandas array, how do I extract the numbers I need for running my operations?

Here is the code I wrote to extract the column and this is what I have so far:

import os
import csv
import pandas
import numpy as np

csvPath = "/path/of/my/csvfile/csvfile.csv"

data = pandas.read_csv(csvPath)
csv_coords = data['Answer.annotation_data'].values #column with the coordinates
image_name = data ['Input.image_url'].values
print csv_coords[2]

Upvotes: 2

Views: 318

Answers (2)

jezrael
jezrael

Reputation: 862671

Use:

import ast

d = {'Answer.annotation_data': ['[{"left":171,"top":0,"width":163,"height":137,"label":"styrofoam container"},{"left":222,"top":42,"width":45,"height":70,"label":"chopstick"}]',
                                '[{"left":170,"top":10,"width":173,"height":157,"label":"styrofoam container"},{"left":222,"top":42,"width":45,"height":70,"label":"chopstick"}]']}
df = pd.DataFrame(d)

print (df)
                              Answer.annotation_data
0  [{"left":171,"top":0,"width":163,"height":137,...
1  [{"left":170,"top":10,"width":173,"height":157...

#convert string data to list of dicts if necessary
df['Answer.annotation_data'] = df['Answer.annotation_data'].apply(ast.literal_eval)

For each value of cols extract values of dict and return DataFrame, last join together by concat:

def get_val(val):
    comb = [[y.get(val, np.nan) for y in x] for x in df['Answer.annotation_data']]
    return pd.DataFrame(comb).add_prefix('{}_'.format(val))

cols = ['left','top','width','height']
df1 = pd.concat([get_val(x) for x in cols], axis=1)
print (df1)
   left_0  left_1  top_0  top_1  width_0  width_1  height_0  height_1
0     171     222      0     42      163       45       137        70
1     170     222     10     42      173       45       157        70

Upvotes: 1

tif
tif

Reputation: 1484

To access one field in your DataFrame

`data.loc[row][column]` or `data.loc[row,column]`

e.g.

`data.loc[0]['left']

To find, e.g. the minimum of the top values globally

min(data['top'])

Upvotes: 0

Related Questions