Reputation: 55

get specific number of data from values in a column in pandas

In order to prevent my machine learning algorithm from tending to a certain data, I want to reduce the frequency differences in my dataset, which is a pandas table,

for example, in column X;

A value is 1500 times
B value is 3000 times
C value is 1300 times

Is there a way to get 1250 of them all?

Upvotes: 2

Answers (3)

Anton B

Reputation: 128

A solution assuming you may have an unknown number of unique values:

import pandas as pd

# Creating a Panda dafatframme with the number of elements
d = {'X': 1500*["A"]+3000*["B"]+1300*["C"]}
df = pd.DataFrame(data=d)

# Create a dictionary containing 1 dataframe for each unique value
dfDict = dict(iter(df.groupby('X')))   

# Keep only the first n values for each and add them to filtered dataframe
for unique_val in dfDict:
    dfDict[unique_val] = dfDict[unique_val][:1250]
    filetered = pd.concat(dfDict, ignore_index=True)

Upvotes: 0

AomineDaici

Reputation: 763

You can group the table according to the column you want to set the frequency of ("X" for your example) and get as many data as you want with the head function (if there is less of a value than the frequency you have given, it will take them all)

df = df.groupby('X').head(1250)

Upvotes: 1

Bushmaster

Reputation: 4608

can you try this:

df2=pd.concat(df[df['X']=='A'][:1250],df[df['X']=='B'][:1250],df[df['X']=='C'][:1250])

Upvotes: 1

get specific number of data from values ​in a column in pandas

Answers (3)

Related Questions

get specific number of data from values in a column in pandas