Emre Oz
Emre Oz

Reputation: 55

get specific number of data from values ​in a column in pandas

In order to prevent my machine learning algorithm from tending to a certain data, I want to reduce the frequency differences in my dataset, which is a pandas table,

for example, in column X;

Is there a way to get 1250 of them all?

Upvotes: 2

Views: 91

Answers (3)

Anton B
Anton B

Reputation: 128

A solution assuming you may have an unknown number of unique values:

import pandas as pd

# Creating a Panda dafatframme with the number of elements
d = {'X': 1500*["A"]+3000*["B"]+1300*["C"]}
df = pd.DataFrame(data=d)

# Create a dictionary containing 1 dataframe for each unique value
dfDict = dict(iter(df.groupby('X')))   

# Keep only the first n values for each and add them to filtered dataframe
for unique_val in dfDict:
    dfDict[unique_val] = dfDict[unique_val][:1250]
    filetered = pd.concat(dfDict, ignore_index=True)

Upvotes: 0

AomineDaici
AomineDaici

Reputation: 763

You can group the table according to the column you want to set the frequency of ("X" for your example) and get as many data as you want with the head function (if there is less of a value than the frequency you have given, it will take them all)

df = df.groupby('X').head(1250)

Upvotes: 1

Bushmaster
Bushmaster

Reputation: 4608

can you try this:

df2=pd.concat(df[df['X']=='A'][:1250],df[df['X']=='B'][:1250],df[df['X']=='C'][:1250])

Upvotes: 1

Related Questions