Reputation: 1117
I've been practicing python for a while now and just got into pandas to start learning dataframes. I understand that df.drop() will remove a column/row based on certain requirements and makes a new df. I was wondering, is there a way to assign those dropped columns/rows to a new variable for logging purposes?
import pandas as pd
L = ["a","b","c","d","a","a"]
df1 = pd.DataFrame(L)
df1.columns = ['letter']
#print(df1)
df2 = df1.drop(df1.letter == "a", axis=0)
print(df2)
letter
2 c
3 d
4 a #why is this row not removed?
5 a #why is this row not removed?
However, this doesn't even print a new df2 where all the rows with "a" are removed (separate problem here not sure why that is happening).
Assigning the removed column to a new df doesn't work because it is using the initial dataframe df1. I am just unsure of how to make two dataframes, one with ONLY the removed columns and one where the removed columns are edited out.
I would want a df3 that prints:
letter
0 a
4 a
5 a
Upvotes: 3
Views: 2785
Reputation: 109696
Create a mask for your condition. Select the rows to be removed based on the condition using boolean indexing. Then reassign df1
to by inverting the mask using ~
(not).
mask = df1['letter'] == 'a'
removed_rows = df1[mask]
df1 = df1[~mask]
>>> df1
letter
1 b
2 c
3 d
>>> removed_rows
letter
0 a
4 a
5 a
Upvotes: 2
Reputation: 313
I would just select the specific rows before dropping them:
df2 = df1.loc[df1.letter == "a"]
Upvotes: 2