Python, is there a way to assign df.drop to a new variable?

Question

I've been practicing python for a while now and just got into pandas to start learning dataframes. I understand that df.drop() will remove a column/row based on certain requirements and makes a new df. I was wondering, is there a way to assign those dropped columns/rows to a new variable for logging purposes?

import pandas as pd
L = ["a","b","c","d","a","a"]
df1 = pd.DataFrame(L)
df1.columns = ['letter']
#print(df1)

df2 = df1.drop(df1.letter == "a", axis=0)
print(df2)

 letter
2      c
3      d
4      a #why is this row not removed?
5      a #why is this row not removed?

However, this doesn't even print a new df2 where all the rows with "a" are removed (separate problem here not sure why that is happening).

Assigning the removed column to a new df doesn't work because it is using the initial dataframe df1. I am just unsure of how to make two dataframes, one with ONLY the removed columns and one where the removed columns are edited out.

I would want a df3 that prints:

letter
0      a
4      a
5      a

Alexander · Accepted Answer

Create a mask for your condition. Select the rows to be removed based on the condition using boolean indexing. Then reassign df1 to by inverting the mask using ~ (not).

mask = df1['letter'] == 'a'
removed_rows = df1[mask]
df1 = df1[~mask]

>>> df1
  letter
1      b
2      c
3      d

>>> removed_rows
  letter
0      a
4      a
5      a

Python, is there a way to assign df.drop to a new variable?

Answers (2)

Related Questions