How to filter elements containing only specific repeated characters in a dataframe

Question

I am looking to create a new dataframe that filters out redundant information from a previous dataframe. The original dataframe is created from looking through many file folders and providing a column of elements each containing a string of the full path to access each file. Each file is named according to trial number and score in a corresponding test folder. I need to remove all reiterations of scores that are 100 for each trial, however, the first score of 100 for each trial must remain.

With python Pandas, I am aware of using df[df[col_header].str.contains('text')] to specifically filter out what is needed and the use of '~' as a boolean NOT.

The unfiltered dataframe column with redundant scores looks like this

\desktop\Test_Scores	est1	rial1-98
\desktop\Test_Scores	est1	rial2-100
\desktop\Test_Scores	est1	rial3-100       #<- must remove
\desktop\Test_Scores	est2	rial1-95
\desktop\Test_Scores	est2	rial2-100
\desktop\Test_Scores	est2	rial3-100       #<- must remove
\desktop\Test_Scores	est2	rial3-100       #<- must remove
.
.
.
n

The expected result after using some code as a filter would be a dataframe that looks like this

\desktop\Test_Scores	est1	rial1-98
\desktop\Test_Scores	est1	rial2-100
\desktop\Test_Scores	est2	rial1-95
\desktop\Test_Scores	est2	rial2-100
.
.
.
.
n

secretive · Accepted Answer

This one line should solve your problem.

df = df.loc[df["col"].shift().str.contains("-100") != df["col"].str.contains("-100")]

Update:

df["col"] = df["col"].str.replace('	','\t')
df['test_number'] = df.col.str.split('-').str[0].str.split('\').str[-2]
df['score'] = df.col.str.split('-').str[1]
df.drop_duplicates(["test_number","score"], inplace = True)
df.drop(["test_number","score"],1,inplace = True)

Check this solution out. The reason why I am doing the replace in very first line is your data contains which in programming is a tab delimiter.

How to filter elements containing only specific repeated characters in a dataframe

Answers (1)

Related Questions