How to find rows in a dataframe based on other rows and other dataframes

Question

From the question I asked here I took a JSON response looking similar to this:

(please note: id's in my sample data below are numeric strings but some are alphanumeric)

data=↓**

{
  "state": "active",
  "team_size": 20,
  "teams": {
    "id": "12345679",
    "name": "Good Guys",
    "level": 10,
    "attacks": 4,
    "destruction_percentage": 22.6,
    "members": [
      {
        "id": "1",
        "name": "John",
        "level": 12
      },
      {
        "id": "2",
        "name": "Tom",
        "level": 11,
        "attacks": [
          {
            "attackerTag": "2",
            "defenderTag": "4",
            "damage": 64,
            "order": 7
          }
        ]
      }
    ]
  },
  "opponent": {
    "id": "987654321",
    "name": "Bad Guys",
    "level": 17,
    "attacks": 5,
    "damage": 20.95,
    "members": [
      {
        "id": "3",
        "name": "Betty",
        "level": 17,
        "attacks": [
          {
            "attacker_id": "3",
            "defender_id": "1",
            "damage": 70,
            "order": 1
          },
          {
            "attacker_id": "3",
            "defender_id": "7",
            "damage": 100,
            "order": 11
          }
        ],
        "opponentAttacks": 0,
        "some_useless_data": "Want to ignore, this doesn't show in every record"
      },
      {
        "id": "4",
        "name": "Fred",
        "level": 9,
        "attacks": [
          {
            "attacker_id": "4",
            "defender_id": "9",
            "damage": 70,
            "order": 4
          }
        ],
        "opponentAttacks": 0
      }
    ]
  }
}

I loaded this using:

df = json_normalize([data['team'], data['opponent']],
                     'members',
                     ['id', 'name'],
                     meta_prefix='team.',
                     errors='ignore')
print(df.iloc(1))
attacks              [{'damage': 70, 'order': 4, 'defender_id': '9'...
id                                                                   4
level                                                                9
name                                                              Fred
opponentAttacks                                                      0
some_useless_data                                                  NaN
team.name                                                     Bad Guys
team.id                                                      987654321
Name: 3, dtype: object

I have a 3 part question in essense.

How do I get a row like the one above using the member tag? I've tried:

member = df[df['id']=="1"].iloc[0]
#Now this works, but am I correctly doing this?
#It just feels weird is all.

How would I retrieve a member's defenses based only given that only attacks are recorded and not defenses (even though defender_id is given)? I have tried:
```
df.where(df['tag']==df['attacks'].str.get('defender_id'), df['attacks'], axis=0)
#This is totally not working.. Where am I going wrong?
```
Since I am retrieving new data from an API, I need to check vs the old data in my database to see if there are any new attacks. I can then loop through the new attacks where I then display to the user the attack info.

This I honestly cannot figure out, I've tried looking into this question and this one as well that I felt were anywhere close to what I needed and am still having trouble wrapping my brain around the concept. Essentially my logic is as follows:
```
def get_new_attacks(old_data, new_data)
    '''params
         old_data: Dataframe loaded from JSON in database
         new_data: Dataframe loaded from JSON API response
                   hopefully having new attacks
       returns:
         iterator over the new attacks
    '''

    #calculate a dataframe with new attacks listed
    return df.iterrows()
```

I know the function above shows little to no effort other than the docs I gave (basically to show my desired input/output) but trust me I've been wracking my brain over this part the most. I've been looking into merging all attacks then doing reset_index() and that just raises an error due to the attacks being a list. The map() function in the second question I linked above has me stumped.

a_guest · Accepted Answer

Referring to your questions in order (code below):

I looks like id is a unique index of the data and so you can use df.set_index('id') which allows you to access data by player id via df.loc['1'] for example.
As far as I understand your data, all the dictionaries listed in each of the attacks are self-contained in a sense that the corresponding player id is not needed (as attacker_id or defender_id seems to be enough to identify the data). So instead of dealing with a rows that contains lists I recommend swapping that data out in its own data frame which makes it easily accessible.
Once you store attacks in its own data frame you can simply compare indices in order to filter out the old data.

Here's some example code to illustrate the various points:

# Question 1.
df.set_index('id', inplace=True)
print(df.loc['1'])  # For example player id 1.

# Question 2 & 3.
attacks = pd.concat(map(
    lambda x: pd.DataFrame.from_dict(x).set_index('order'),  # Is 'order' the right index?
    df['attacks'].dropna()
))

# Question 2.
print(attacks[attacks['defender_id'] == '1'])  # For example defender_id 1.

# Question 3.
old_attacks = attacks.iloc[:2]  # For example.
new_attacks = attacks[~attacks.index.isin(old_attacks.index)]
print(new_attacks)

How to find rows in a dataframe based on other rows and other dataframes

Answers (1)

Related Questions