Split Multiple Values into New Rows

Question

I have a dataframe where a few columns may have multiple values in a single observation. Each observation in these rows has a "/" at the end of the observation, regardless of whether or not there are multiple. This means that some of the values look like this: 'OneThing/' while others like this: 'OneThing/AnotherThing/'

I need to take the values where there is more than one value in an observation and split them into individual rows.

This is a general example of what the dataframe looks like before:

ID  Date   Name ColA   ColB   Col_of_Int                        ColC   ColD
1   09/12  Ann  String String OneThing/                         String String
2   09/13  Pete String String OneThing/AnotherThing             String String
3   09/13  Ann  String String OneThing/AnotherThing/ThirdThing/ String String
4   09/12  Pete String String OneThing/                         String String

What I want the output to be:

ID  Date   Name ColA   ColB   Col_of_Int                        ColC   ColD
1   09/12  Ann  String String OneThing                         String String
2   09/13  Pete String String OneThing                         String String
2   09/13  Pete String String Another Thing                    String String
3   09/13  Ann  String String OneThing                         String String
3   09/13  Ann  String String AnotherThing                     String String
3   09/13  Ann  String String ThirdThing                       String String
4   09/12  Pete String String OneThing/                        String String

I've tried the following:

df = df[df['Column1'].str.contains('/')]
df_split = df[df['Column1'].str.contains('/')]
df1 = df_split.copy()
df2 = df_split.copy()

split_cols = ['Column1']

for c in split_cols:
    df1[c] = df1[c].apply(lambda x: x.split('/')[0])
    df2[c] = df2[c].apply(lambda x: x.split('/')[1])

new_rows = df1.append(df2)
df.drop(df_split.index, inplace=True)
df = df.append(new_rows, ignore_index=True)

This works, but I think it is creating new rows after every '/', which means that one new row is being created for every observation with only one value (where I want zero new rows), and two new rows are being created for every observation with two values (only need one), etc.

This is particularly frustrating where there are three or more values in an observation because I am getting several unnecessary rows.

Is there any way to fix this so that only observations with more than one get added to new rows?

Split Multiple Values into New Rows

Answers (1)

Related Questions