Python_Learner
Python_Learner

Reputation: 1637

Multiply String in Dataframe?

My desired output is the following:

    count    tally
1    2       //   
2    3       ///
3    5       /////
4    3       ///
5    2       //

My code:

my_list = [1,1,2,2,2,3,3,3,3,3,4,4,4,5,5] 
my_series = pd.Series(my_list)
values_counted = pd.Series(my_series.value_counts(),name='count') 
# other calculated columns left out for SO simplicity
df = pd.concat([values_counted], axis=1).sort_index()
df['tally'] = values_counted * '/'

With the code above I get the following error:

masked_arith_op
    result[mask] = op(xrav[mask], y)
numpy.core._exceptions.UFuncTypeError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U21'), dtype('<U21')) -> dtype('<U21')

In searching for solutions I found one on SO that said to try:

values_counted * float('/')

But that did not work.

In 'normal' Python outside of Dataframes the following code works:

10 * '/'

and returns

///////////

How can I achieve the same functionality in a Dataframe?

Upvotes: 1

Views: 832

Answers (2)

Mustafa Aydın
Mustafa Aydın

Reputation: 18306

You can group the series by itself and then aggregate:

new_df = my_series.groupby(my_series).agg(**{"count": "size",
                                             "tally": lambda s: "/" * s.size})

to get

>>> new_df

   count  tally
1      2     //
2      3    ///
3      5  /////
4      3    ///
5      2     //

Upvotes: 1

jezrael
jezrael

Reputation: 862761

Use lambda function for repeat values, your solution is simplify:

my_list = [1,1,2,2,2,3,3,3,3,3,4,4,4,5,5] 
df1 = pd.Series(my_list).value_counts().to_frame('count').sort_index()

df1['tally'] = df1['count'].apply(lambda x: x * '/')
print (df1)
   count  tally
1      2     //
2      3    ///
3      5  /////
4      3    ///
5      2     //

Upvotes: 1

Related Questions