Jason
Jason

Reputation: 71

How to multiply column values in dataframe using Pyspark (Python)

I have the following dataframe with numeric values for each column:

    Total.show(5)
    
    
   |id_cliente|consumo_datos_MB|sms_enviados|minutos_llamadas_movil|minutos_llamadas_fijo|sum(id_cliente)|sum(consumo_datos_MB)|sum(sms_enviados)|sum(minutos_llamadas_movil)|sum(minutos_llamadas_fijo)|
        +----------+----------------+------------+----------------------+---------------------+---------------+---------------------+-----------------+---------------------------+--------------------------+
        |         2|             611|           0|                    41|                   38|              2|                  611|                0|                         41|                        38|
        |         8|             284|           5|                    71|                   31|              8|                  284|                5|                         71|                        31|
        |        14|            1324|           0|                    28|                   29|             14|                 1324|                0|                         28|                        29|
        |        21|            1748|           0|                    81|                   12|             21|                 1748|                0|                         81|                        12|
        |        25|            1555|           0|                    60|                    6|             25|                 1555|                0|                         60|                         6|
        +----------+----------------+------------+----------------------+---------------------+---------------+---------------------+-----------------+-----------------

What I need to do is create another DF with each of those values multiplied by a coefficient, which would be the following:

0.4 --> minutos_llamadas_movil
0.3 --> consumo_datos_MB
0.2 --> minutos_llamadas_fijo
0.1 --> sms_enviados

That means that I would have to multiply each item in each column for a different value, i.e.: every item under minutos_llamadas_movil would have to be multiplied for 0.4, each on under consumo_datos_MB for 0.3 and so on.

This is what I have tried:

Sumas = Total["consumo_datos_MB", "sms_enviados", "minutos_llamadas_movil", "minutos_llamadas_fijo"]= 
0.3 * Total["consumo_datos_MB"], 
0.4 * Total["minutos_llamadas_movil"], 
0.2 * Total["minutos_llamadas_fijo"],
0.1 * Total["sms_enviados"]

print(Sumas)

I've gotten the following error message when trying to run the piece of code above: typeerror 'dataframe' object does not support item assignment pyspark

Everything I've seen online about this error talked about date formats, but that's not the case here so I'm not sure what the problem might be.

Can anyone help please?

Thanks a lot in advance!

Upvotes: 0

Views: 1179

Answers (1)

werner
werner

Reputation: 14845

You can use withColumn to multipy the values:

Sumas = Total.withColumn("consumo_datos_MB", 0.3 * Total["consumo_datos_MB"] ) \
    .withColumn("minutos_llamadas_movil", 0.4 * Total["minutos_llamadas_movil"] )\
    .withColumn("minutos_llamadas_fijo", 0.2 * Total["minutos_llamadas_fijo"] ) \
    .withColumn("sms_enviados", 0.1 * Total["sms_enviados"] )

Upvotes: 2

Related Questions