Reputation: 11
I am struggling with a problem for a while now. I have a list consisting of other lists which looks like this when printed element by element:
[120, 2, 'germany']
[122, 1, 'germany']
[123, 1, 'france']
[122, 1, 'germany']
[120, 2, 'germany']
[123, 1, 'france']
I would like to drop the sublists which have the same value in the last "column" (index[2]), but only the ones that have a lower value in the middle "column" (index[1]) than the rows with the same last index[2]. No worries, I am aware that this is a list so far and has no actual columns. Hence, the output should like like:
[120, 2, 'germany']
[123, 1, 'france']
[120, 2, 'germany']
[123, 1, 'france']
I have tried multiple approaches so far, from single loops over simultaneous loops up to creating a copy of the list and iterate again. After a while, I have transformed the list into a pandas dataframe (although I will have to transform it back to list for further processing) and tried several methods again, without success so far.
The closest I got is:
dfList = pd.DataFrame(list, columns =['ID',"status","country"])
dfList = dfList.groupby('country').filter(lambda g: len(g) > 1).drop_duplicates(subset=['country', 'status'], keep="first")
However, this will only provide me with each unique value within the dataframe. The difficulty lies in the fact that the code needs to be reusable with other values as well, hence I cannot filter with the actual values.
I would be very thankful for some thoughts or ideas.
Upvotes: 0
Views: 101
Reputation: 9941
Here's with transform
to get max status within each group:
dfList[
dfList['status'].eq(
dfList.groupby('country')['status'].transform('max'))]
Output:
ID status country
0 120 2 germany
2 123 1 france
4 120 2 germany
5 123 1 france
Upvotes: 1
Reputation: 15872
You can try the following:
>>> (dfList.groupby('country', group_keys=False)
.apply(lambda g: g[g.status == g.status.max()])
.sort_index())
ID status country
0 120 2 germany
2 123 1 france
4 120 2 germany
5 123 1 france
Upvotes: 1
Reputation: 690
if I understood the question correctly, i can suggest this approach as a basis for your code
rows=[[120, 2, 'germany'],
[122, 1, 'germany'],
[123, 1, 'france'],
[122, 1, 'germany'],
[120, 2, 'germany'],
[123, 1, 'france']]
rd={}
for row in rows:
if row[2] not in rd:
rd[row[2]]=row[1]
print(row)
elif row[1] >= rd[row[2]]:
rd[row[2]]=row[1]
print(row)
The output will be
[120, 2, 'germany']
[123, 1, 'france']
[120, 2, 'germany']
[123, 1, 'france']
just like you wanted
Upvotes: 0
Reputation: 269
try to use this code
data = [[120, 2, 'germany']
, [122, 1, 'germany']
, [123, 1, 'france']
, [122, 1, 'germany']
, [120, 2, 'germany']
, [123, 1, 'france']]
d = {}
for i in data:
if not d.__contains__(i[2]):
d[i[2]] = i
elif d[i[2]][1] < i[1]:
d[i[2]] = i
for i in d.values():
print(i)
Upvotes: 0