Reputation: 63
I have a dataframe like this:
ID, Values
1 10, 11, 12, 13
2 14
3 15, 16, 17, 18
I want to create a new dataframe like this:
ID COl1 Col2
1 10 11
1 11 12
1 12 13
2 14
3 15 16
3 16 17
3 17 18
Please help me in how to do this??? Note: The rows in Values column of input df are str type.
Upvotes: 3
Views: 523
Reputation: 1921
Tried slightly different approach. Created a function which will return numbers in pairs from the initial comma separated string.
def pairup(mystring):
"""Function to return paired up list from string"""
mylist = mystring.split(',')
if len(mylist) == 1: return [mylist]
splitlist = []
for index, item in enumerate(mylist):
try:
splitlist.append([mylist[index], mylist[index+1]])
except:
pass
return splitlist
Now let's create the new data frame.
# https://stackoverflow.com/a/39955283/3679377
new_df = df[['ID']].join(
df.Values.apply(lambda x: pd.Series(pairup(x)))
.stack()
.apply(lambda x: pd.Series(x))
.fillna("")
.reset_index(level=1, drop=True),
how='left').reset_index(drop=True)
new_df.columns = ['ID', 'Col 1', 'Col 2']
Here's the output of print(new_df)
.
ID Col 1 Col 2
0 1 10 11
1 1 11 12
2 1 12 13
3 2 14
4 3 15 16
5 3 16 17
6 3 17 18
Upvotes: 0
Reputation: 863291
Use list comprehension with flattening and small change - if i > 0:
to if i == 2:
for correct working with one element values:
from collections import deque
#https://stackoverflow.com/a/36586925
def chunks(iterable, chunk_size=2, overlap=1):
# we'll use a deque to hold the values because it automatically
# discards any extraneous elements if it grows too large
if chunk_size < 1:
raise Exception("chunk size too small")
if overlap >= chunk_size:
raise Exception("overlap too large")
queue = deque(maxlen=chunk_size)
it = iter(iterable)
i = 0
try:
# start by filling the queue with the first group
for i in range(chunk_size):
queue.append(next(it))
while True:
yield tuple(queue)
# after yielding a chunk, get enough elements for the next chunk
for i in range(chunk_size - overlap):
queue.append(next(it))
except StopIteration:
# if the iterator is exhausted, yield any remaining elements
i += overlap
if i == 2:
yield tuple(queue)[-i:]
L = [[x] + list(z) for x, y in zip(df['ID'], df['Values']) for z in (chunks(y.split(', ')))]
df = pd.DataFrame(L, columns=['ID','Col1','Col2']).fillna('')
print (df)
ID Col1 Col2
0 1 10 11
1 1 11 12
2 1 12 13
3 2 14
4 3 15 16
5 3 16 17
6 3 17 18
Upvotes: 1