Yi Du
Yi Du

Reputation: 525

Explode one row into multiple rows

I have data frame with only one row.

from datetime import datetime
spark = SparkSession.builder.getOrCreate()
as_of_date = datetime.strptime('2013-01-01', '%Y-%m-%d')
columns = ['id', 'row', 'month']
vals = [('A', 1, as_of_date)]
df = spark.createDataFrame(vals, columns)

I want to create more rows based on 'row' column's value 1 so that the new data frame will look like

id, row, month
A, -2, 2012-10-01
A, -1, 2012-11-01
A, 0, 2012-12-01
A, 1, 2013-01-01

Look up explode but still can't figure out exactly how to do this.

Thanks.

Upvotes: 0

Views: 110

Answers (1)

furas
furas

Reputation: 142641

I don't know if it will work with spark.

In pandas first I tried to use append() to add new rows, and later reverse order

import pandas as pd
from datetime import datetime
from datetime import timedelta

as_of_date = datetime.strptime('2013-01-01', '%Y-%m-%d')
columns = ['id', 'row', 'month']
vals = [('A', 1, as_of_date)]
df = pd.DataFrame(vals, columns=columns)

#print(df.dtypes)
#print(df)

item = df.iloc[0].copy()  # original value

for x in range(3):
    item['row'] = -x
    item['month'] = item['month'] - timedelta(days=1)
    item['month'] = item['month'].replace(day=1)
    df = df.append(item)

# reverse and reset index
df = df[::-1].reset_index(drop=True)

print(df)

But later I released I can do it directly with DataFrame

import pandas as pd
from datetime import datetime

as_of_date = datetime.strptime('2013-01-01', '%Y-%m-%d')
columns = ['id', 'row', 'month']
vals = [('A', 1, as_of_date)]

ID, ROW, MONTH = vals[0]
N = 4 # number of rows

df = pd.DataFrame({
    'id': [ID for _ in range(N)],
    'row': range(ROW, ROW-N, -1),
    'month': pd.date_range(MONTH, periods=N, freq='-1M')
})

# reverse and reset index
df = df[::-1].reset_index(drop=True)                     

print(df)

Upvotes: 1

Related Questions