sophocles
sophocles

Reputation: 13821

python pandas - Convert a column of tuples to string column

This should be a relatively simple question.

Below is the sample of my df column:

             title2
1      (, 2 ct, , )
2      (, 1 ct, , )
3      (, 2 ct, , )
4               NaN
5      (, 2 ct, , )
6     (, 5 ct, , )
7  (, 7 ounce, , )
8    (, 1 gal, , )
9              NaN
10             NaN

I would like to convert the whole column to a proper string column - i.e. my desired output would be:

    title2
1      2ct
2      1ct
3      2ct
4      NaN
5      2ct
6      5ct
7  7 ounce
8     1gal
9      NaN
10     NaN

I have tried the following commands, but none seem to work:

title['title3'] = title['title2'].agg(' '.join)
title['title3'] = title['title2'].apply(lambda x: ''.join(x))
title['title3'] = title['title2'].astype(str)
title['title3'] = title['title2'].values.astype(str)

The answer given in this post: Convert a pandas column containing tuples to string, also does not help me unfortunately.

Can some shed some light on this? Thank you all.

Upvotes: 1

Views: 1072

Answers (3)

IoaTzimas
IoaTzimas

Reputation: 10624

Try the following. I assume that tuples and Nans are saved as strings in your column, if not let me know so that i will adjust solution:

def clear(x):
    if x=='Nan':
        return 'Nan'
    else:
        l=str(x)
        l=[i.strip() for i in l.split(',')]
        return [i for i in l if any(k in ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9') for k in i)][0]

df['title2']=df['title2'].apply(lambda x: clear(x))

Upvotes: 1

ExistingAsMike
ExistingAsMike

Reputation: 37

Using regex:

import re

df['title3'] = df['title2'].apply(lambda x: re.sub('[^A-Za-z0-9]', '', str(x)))

Upvotes: 1

This will do the trick

demo_data['title2'] = demo_data['title2'].astype(str).map(lambda x: x.lstrip("\,\'\[ \(").rstrip(" \, \,\'\]\)"))
demo_data['title2'] = demo_data['title2'].str.replace(r"\', \'", ",")
demo_data['title2']= demo_data['title2'].astype(str).map(lambda x: x.lstrip("\,\'\[ \(").rstrip(" \, \,\'\]\)"))
demo_data['title2'] = demo_data['title2'].str.replace(r" ", "")

which gives.

   ID  title2
0   1     2ct
1   2     1ct
2   3     2ct
3   4     nan
4   5     2ct
5   6     5ct
6   7  7ounce
7   8    1gal
8   9     nan
9  10     nan

Upvotes: 1

Related Questions