apds1088
apds1088

Reputation: 60

Split values in rows in one column while duplicating other column data

I'm trying to split the values in each row of a column into multiple rows while duplicating the corresponding values of other columns. I'm fairly new to python and trying to figure out a way to implement this solution to a bigger dataset.

This is the input file:

Name    Year    Subject                                        State
Jack    2003    Math, Sci, Music                               MA
Sam     2004    Math, PE, Language, Social                     CA
Nicole  2005    Math, Life Sci, Geography, Music, Computer Sci NY

This is what I want as output:

Name    Year    Subject            State
Jack    2003    Math               MA
Jack    2003    Sci                MA
Jack    2003    Music              MA
Sam     2004    Math               CA
Sam     2004    PE                 CA
Sam     2004    Language           CA
Sam     2004    Social             CA
Nicole  2005    Math               NY
Nicole  2005    Life Sci           NY
Nicole  2005    Geography          NY
Nicole  2005    Music              NY
Nicole  2005    Computer Sci       NY

I tried this code:

import pandas as pd 

df= pd.read_csv('C:/Users/3216140/Desktop/test.csv', delimiter=',', skiprows = 1, names = ["Name","Year","Subject","State"] ) 
print(df) 
sub = df['Subject'].str.split(',').apply(pd.Series, 1).stack() 
sub.index = sub.index.droplevel(-1) 
sub.name = 'Subject' 
print (sub) 
del df['Subject'] 
df.join(sub) 
print(df) 

But the join doesn't seem to have worked. I just get the input file without the 'Subject' as output.

Upvotes: 2

Views: 87

Answers (1)

cs95
cs95

Reputation: 402852

You can use np.repeat and itertools.chain here.

from itertools import chain

v = df.pop('Subject').str.split(r'\s*,\s*')
df_new = pd.DataFrame(
    df.values.repeat(v.str.len(), axis=0),
    columns=df.columns
)
df_new['Subject'] = list(itertools.chain.from_iterable(v))

df_new

      Name State Year       Subject
0     Jack  2003   MA          Math
1     Jack  2003   MA           Sci
2     Jack  2003   MA         Music
3      Sam  2004   CA          Math
4      Sam  2004   CA            PE
5      Sam  2004   CA      Language
6      Sam  2004   CA        Social
7   Nicole  2005   NY          Math
8   Nicole  2005   NY      Life Sci
9   Nicole  2005   NY     Geography
10  Nicole  2005   NY         Music
11  Nicole  2005   NY  Computer Sci

Upvotes: 1

Related Questions