funwide
funwide

Reputation: 25

Repeat each row of a DataFrame N times to create a new DataFrame (N is defined by another column)

I am a beginner when it comes to Pandas and I recently had a problem which I could not solve with MS Excel. I have a DataFrame df like:

    Jobtype  JobCount Degree
0   BI Analyst  3     Winf
1   BI Analyst  2     Wing
2   BI Analyst  5     Wiwi
3   Marketing   1     Winf
4   Marketing   2     Wing
5   Marketing   6     Wiwi

(Side note: these are German abbreviations for the degree)

What I want to do is create a new, empty DataFrame which has the columns "JobType" and "Degree" in it with the same values BUT instead of having a column "JobCount" I want the same row to be JobCount-times added to the new, empty DataFrame. For example, I want to have a new DataFrame looking like this:

    Jobtype     Studiengang
0   BI Analyst  Winf
1   BI Analyst  Winf
2   BI Analyst  Winf
3   BI Analyst  Wing
4   BI Analyst  Wing
...

So copy a row JobCount-times to the new DataFrame, without the JobCount column being in it (JobCount-times means the value for JobCount in that specific row). I actually solved this issue but it took me too long and I don't think that my solution is the right way, there has to be a more simple way to do so and therefore I ask for help since I couldn't find a proper solution.

My Code (which works but is way too complicated):

df = pd.read_csv("tete.csv", sep = ";")

cols = ["Jobtype", "Degree"]
new = pd.DataFrame(columns = cols)

df_row_count = range(0, len(df.index))
new_row_count = 0

for i in df_row_count:
    job_count = 0
    while job_count < df.iloc[i]["JobCount"]:
        new.loc[new_row_count] = df.loc[I]
        job_count += 1 
        new_row_count += 1

print(new)

Upvotes: 1

Views: 64

Answers (2)

wwnde
wwnde

Reputation: 26676

Use numpy repeat to create new datframe with the same columns names as original and drop the column JobCount

 import numpy as np
newdf = pd.DataFrame(np.repeat(df.values,df.JobCount,axis=0), columns=df.columns).drop(columns=['JobCount'])#Replicate rows as per condition


    Jobtype   Degree
0   BIAnalyst   Winf
1   BIAnalyst   Winf
2   BIAnalyst   Winf
3   BIAnalyst   Wing
4   BIAnalyst   Wing
5   BIAnalyst   Wiwi
6   BIAnalyst   Wiwi
7   BIAnalyst   Wiwi
8   BIAnalyst   Wiwi
9   BIAnalyst   Wiwi
10  Marketing   Winf
11  Marketing   Wing
12  Marketing   Wing
13  Marketing   Wiwi
14  Marketing   Wiwi
15  Marketing   Wiwi
16  Marketing   Wiwi
17  Marketing   Wiwi
18  Marketing   Wiwi

Upvotes: 1

cs95
cs95

Reputation: 403128

Let's try repeat and reindex:

(df.drop('JobCount', 1)
   .reindex(df.index.repeat(df['JobCount']))
   .reset_index(drop=True))

Or, if you want something a bit shorter, you can pop while repeating:

(df.reindex(df.index.repeat(df.pop('JobCount')))
   .reset_index(drop=True))
       Jobtype Degree
0   BI Analyst   Winf
1   BI Analyst   Winf
2   BI Analyst   Winf
3   BI Analyst   Wing
4   BI Analyst   Wing
5   BI Analyst   Wiwi
6   BI Analyst   Wiwi
7   BI Analyst   Wiwi
8   BI Analyst   Wiwi
9   BI Analyst   Wiwi
10   Marketing   Winf
11   Marketing   Wing
12   Marketing   Wing
13   Marketing   Wiwi
14   Marketing   Wiwi
15   Marketing   Wiwi
16   Marketing   Wiwi
17   Marketing   Wiwi
18   Marketing   Wiwi
 

Upvotes: 1

Related Questions