Reputation: 565
I am working to extract all integer values from a specific column (left, top, length and width) in a csv file with multiple rows and columns. I have used pandas to isolate the columns I am interested in but Im stuck on how to use a specific parts of an array.
Let me explain: I need to use the CSV file's column with "left, top, length and width" attributes to then obtain xmin, ymin, xmax and ymax (these are coordinated of boxes in images). Example of a row in this column looks like so:
[{"left":171,"top":0,"width":163,"height":137,"label":"styrofoam container"},{"left":222,"top":42,"width":45,"height":70,"label":"chopstick"}]
And I need to extract the 171, 0, 163 and 137 to do the necessary operations for finding my xmax, xmin, ymax and ymin
The above line is a single row in my pandas array, how do I extract the numbers I need for running my operations?
Here is the code I wrote to extract the column and this is what I have so far:
import os
import csv
import pandas
import numpy as np
csvPath = "/path/of/my/csvfile/csvfile.csv"
data = pandas.read_csv(csvPath)
csv_coords = data['Answer.annotation_data'].values #column with the coordinates
image_name = data ['Input.image_url'].values
print csv_coords[2]
Upvotes: 2
Views: 318
Reputation: 862671
Use:
import ast
d = {'Answer.annotation_data': ['[{"left":171,"top":0,"width":163,"height":137,"label":"styrofoam container"},{"left":222,"top":42,"width":45,"height":70,"label":"chopstick"}]',
'[{"left":170,"top":10,"width":173,"height":157,"label":"styrofoam container"},{"left":222,"top":42,"width":45,"height":70,"label":"chopstick"}]']}
df = pd.DataFrame(d)
print (df)
Answer.annotation_data
0 [{"left":171,"top":0,"width":163,"height":137,...
1 [{"left":170,"top":10,"width":173,"height":157...
#convert string data to list of dicts if necessary
df['Answer.annotation_data'] = df['Answer.annotation_data'].apply(ast.literal_eval)
For each value of cols
extract values of dict
and return DataFrame
, last join together by concat
:
def get_val(val):
comb = [[y.get(val, np.nan) for y in x] for x in df['Answer.annotation_data']]
return pd.DataFrame(comb).add_prefix('{}_'.format(val))
cols = ['left','top','width','height']
df1 = pd.concat([get_val(x) for x in cols], axis=1)
print (df1)
left_0 left_1 top_0 top_1 width_0 width_1 height_0 height_1
0 171 222 0 42 163 45 137 70
1 170 222 10 42 173 45 157 70
Upvotes: 1
Reputation: 1484
To access one field in your DataFrame
`data.loc[row][column]` or `data.loc[row,column]`
e.g.
`data.loc[0]['left']
To find, e.g. the minimum of the top
values globally
min(data['top'])
Upvotes: 0