Reputation: 341
I have the following table ( df):
shape | data |
---|---|
POLYGON | ((1280 16068.18, 1294 16059, 1297 16060, 1300 16063, 1303 16065, 1308 16066)) |
POINT | POINT ((37916311947 12769)) |
POLYGON | POLYGON ((1906.23 12983, 1908 12982, 1916 12974, 1917 12972, 1917 12970)) |
I would like to convert the table to the following format:
Desired output:
converted_data |
---|
[(1280, 16068), (1294, 16059), (1297, 16060), (1300, 16063), (1303, 16065), (1308, 16066)] |
[(37916311947, 12769)] |
[(1906, 12983), (1908, 12982), (1916, 12974), (1917, 12972), (1917, 12970)] |
I would like to modify the parenthesis and add comma and remove the word POLYGON or POINT. What I tried so far?
res1 = []
for ip, geom in zip(df2['data'], df2['SHAPE']):
if geom == 'POINT':
st = str(ip)[8:-2]
elif geom == 'POLYGON/SURFACE':
st = str(ip)[10:-2]
s = st.split(',')
res1.append(s)
res = []
for i in res1:
res.append([tuple(map(int, j.split())) for j in i])
data2 = df2.copy()
data2['converted_data']=res
´´´
The above script works saves the output as tuple and not int. How do I optimize my script?
Upvotes: 0
Views: 309
Reputation: 9047
df = pd.DataFrame([['POLYGON', '((1280 16068.18, 1294 16059, 1297 16060, 1300 16063, 1303 16065, 1308 16066))'],
['POINT', 'POINT ((37916311947 12769))'],
['POLYGON', 'POLYGON ((1906.23 12983, 1908 12982, 1916 12974, 1917 12972, 1917 12970))']], columns=['shape', 'data'])
df['data'] = df['data'].str.findall(r'(\d[\d.\s]+\d)').apply(lambda x: [tuple(map(lambda x: int(float(x)), i.split())) for i in x])
df
shape data
0 POLYGON [(1280, 16068), (1294, 16059), (1297, 16060), ...
1 POINT [(37916311947, 12769)]
2 POLYGON [(1906, 12983), (1908, 12982), (1916, 12974), ...
Upvotes: 1
Reputation: 4879
The first part of your code seems fine - In the second part you are probably trying to split i
instead of j
x = '1280 16068.18, 1294 16059, 1297 16060, 1300 16063, 1303 16065, 1308 16066'
x_split = [tuple(map(lambda x: int(float(x)), i.strip().split())) for i in x.strip().split(',')]
#[(1280, 16068),
# (1294, 16059),
# (1297, 16060),
# (1300, 16063),
# (1303, 16065),
# (1308, 16066)]
Upvotes: 1