Reputation: 621
I have a dataframe df
as this one:
my_list
Index
0 [81310, 81800]
1 [82160]
2 [75001, 75002, 75003, 75004, 75005, 75006, 750...
3 [95190]
4 [38170, 38180]
5 [95240]
6 [71150]
7 [62520]
I have a list named code
with at least one element.
code = ['75008', '75015']
I want to create another column in my DataFrame
named my_min
, containing the minimum absolute difference between each element of the list code
and the list from df.my_list
.
Here are the commands I tried :
df.loc[:, 'my_list'] = min([abs(int(x)-int(y)) for x in code for y in df.loc[:, 'my_list'].str[:]])
>>> TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'
#or
df.loc[:, 'my_list'] = min([abs(int(x)-int(y)) for x in code for y in df.loc[:, 'my_list']])
>>> TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'
#or
df.loc[:, 'my_list'] = min([abs(int(x)-int(y)) for x in code for y in df.loc[:, 'my_list'].tolist()])
>>> TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'
#or
df.loc[:, 'my_list'] = min([abs(int(x)-int(y)) for x in code for y in z for z in df.loc[:, 'my_list'].str[:]])
>>> UnboundLocalError: local variable 'z' referenced before assignment
#or
df.loc[:, 'my_list'] = min([abs(int(x)-int(y)) for x in code for y in z for z in df.loc[:, 'my_list']])
>>> UnboundLocalError: local variable 'z' referenced before assignment
#or
df.loc[:, 'my_list'] = min([abs(int(x)-int(y)) for x in code for y in z for z in df.loc[:, 'my_list'].tolist()])
>>> UnboundLocalError: local variable 'z' referenced before assignment
Upvotes: 1
Views: 80
Reputation: 150735
If you have pandas 0.25+
you can use explode
and combine with np.min
:
# sample data
df = pd.DataFrame({'my_list':
[[81310, 81800], [82160], [75001,75002]]})
code = ['75008', '75015']
# concatenate the lists into one series
s = df.my_list.explode()
# convert `code` into np.array
code = np.array(code, dtype=int)
# this is the output series
pd.Series(np.min(np.abs(s.values[:,None] - code),axis=1),
index=s.index).min(level=0)
Output:
0 6295
1 7145
2 6
dtype: int64
Upvotes: 0
Reputation: 20450
Write a helper: def find_min(lst):
-- it is clear you know how to do that. The helper will consult a global named code
.
Then apply it:
df['my_min'] = df.my_list.apply(find_min)
The advantage of breaking out a helper is you can write separate unit tests for it.
If you prefer to avoid globals,
you will find partial
quite helpful.
https://docs.python.org/3/library/functools.html#functools.partial
Upvotes: 1
Reputation: 19005
you could do this with a list comprehension:
import pandas as pd
import numpy as np
df = pd.DataFrame({'my_list':[[81310, 81800],[82160]]})
code = ['75008', '75015']
pd.DataFrame({'my_min':[min([abs(int(i) - j) for i in code for j in x])
for x in df.my_list]})
returns
my_min
0 6295
1 7145
You could also use pd.Series.apply
instead of the outer list, for example:
df.my_list.apply(lambda x: min([abs(int(i) - j) for i in code for j in x]) )
Upvotes: 1