Difference between pandas and spark in terms of 'in-memory' processing in python

Question

I have learn about in-memory processing of spark which is an advantage over pandas. But I compare below pandas and spark program where they create dataframe and concact two columns. In both cases spark and pandas will do the processing in 'in-memory'as data should be in RAM for processing.So how spark gives an advantage in this scenario compared to pandas as both are processing in-memory? Also when we should go for spark and pandas?

spark :-

df=spark.createDataFrame([
        ("Red",1,"Apple",date(2021,1,1),''),
        ("Black",2,"Grape",date(2021,2,3),''),
        ("Yellow",3,"Banana",date(2022,2,4),'')
        ],schema="color string,sr_no long,fruit string,orderDate date,desc string")
df2 = df.withColumn("desc", concat(col("color"), col("fruit")))
print(df2.show())

pandas :-

data = {'color': ['Red', 'Black', 'Yellow'],
        'sr_no': ['1', '2', '3'],
        'fruit':['Apple','Grape','Banana'],
        'orderDate':['2021-01-01','2021-02-03','2022-02-04']
        }   
df = pd.DataFrame.from_dict(data)
df['desc']=df['color']+df['fruit']
print(df)

o/p:-

color,sr_no,fruit,orderDate,desc
Red,1,Apple,2021-01-01,RedApple
Black,2,Grape,2021-02-03,BlackGrape
Yellow,3,Banana|2022-02-04,YellowBanana

Difference between pandas and spark in terms of 'in-memory' processing in python

Answers (1)

Related Questions

Difference between pandas and spark in terms of &#39;in-memory&#39; processing in python

Answers (1)

Related Questions

Difference between pandas and spark in terms of 'in-memory' processing in python