Vishnubhatla Saivamsi
Vishnubhatla Saivamsi

Reputation: 43

Split rows to create new rows in Pandas Dataframe

I have a pandas dataframe in which one column of text strings contains multiple comma-separated values. I want to split each field and create a new row per entry only where the number of commas is equal to 2. My entire dataframe has only values with either no. of commas =1 or 2. For example, a should become b:

In [7]: a
Out[7]: 
    var1   var2   var3
0  a,b,c     1     X
1  d,e,f     2     Y
2  g,h       3     Z
In [8]: b
Out[8]: 
     var1  var2   var3
0    a,c     1     X
1    b,c     1     X
2    d,f     2     Y
3    e,f     2     Y
4    g,h     3     Z

Upvotes: 0

Views: 148

Answers (4)

rhug123
rhug123

Reputation: 8768

Here is another way using itertools.pairwise()

df.assign(var1 = df['var1'].map(lambda x: list(map(','.join,itertools.pairwise(x.split(',')))))).explode('var1')

Old answer:

df2 = df.loc[df['var1'].str.count(',').eq(2)]

s = (df2.assign(var1 = df2['var1'].str.split(','))
     .explode('var1').groupby(level=0)
     .agg(one = ('var1',lambda x: x.iloc[0] +','+ x.iloc[-1]),
          two = ('var1',lambda x: x.iloc[1] +','+ x.iloc[-1]))
     .stack().droplevel(1))

df2 = (pd.concat([df.loc[s.index].assign(var1 = s.to_numpy()),
                  df.loc[df['var1'].str.count(',').eq(1)]],ignore_index=True))

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195428

Based on your comment that var1 column has only 1 or 2 commas:

def fn(x):
    x = x.split(",")
    if len(x) == 2:
        return [",".join(x)]
    return ["{},{}".format(x[0], x[2]), "{},{}".format(x[1], x[2])]


df = df.assign(var1=df["var1"].apply(fn)).explode("var1").reset_index(drop=True)
print(df)

Prints:

  var1  var2 var3
0  a,c     1    X
1  b,c     1    X
2  d,f     2    Y
3  e,f     2    Y
4  g,h     3    Z

Upvotes: 1

joao
joao

Reputation: 2293

I'm doing in two steps: first, transform the first column when there are two commas, introducing a tuple of strings (this is done by applying func to the first column, each s is a cell string contents). Then use explode to turn those tuples into a couple of rows.

def func(s):
    t = s.split(',')
    return s if len(t) == 2 else (f'{t[0]},{t[2]}', f'{t[1]},{t[2]}')

df.var1 = df.var1.apply(func)
df = df.explode('var1').reset_index(drop=True)

Upvotes: 0

Rob Raymond
Rob Raymond

Reputation: 31156

  • have taken the approach that you want combinations of constituent parts
  • specifically there is a combination you want to exclude
  • have used an additional column just for purpose of transparency of solution
import io
import itertools

df = pd.read_csv(io.StringIO("""    var1   var2   var3
0  a,b,c     1     X
1  d,e,f     2     Y
2  g,h       3     Z"""), sep="\s+")

df["var1_2"] = df["var1"].str.split(",").apply(lambda x: [",".join(list(c)) 
                                                          for c in itertools.combinations(x, 2) 
                                                          if len(x)<=2 or list(c) != x[:2]])
df.explode("var1_2")

var1 var2 var3 var1_2
a,b,c 1 X a,c
a,b,c 1 X b,c
d,e,f 2 Y d,f
d,e,f 2 Y e,f
g,h 3 Z g,h

Upvotes: 1

Related Questions