Reputation: 11
I try to flatten some columns in my dataframe, but unfurtunately it does not work. What would be the correct way of doing this?
created_at | tweet_hashtag | tweet_cashtag |
---|---|---|
2022-07-23 | [{'start': 16, 'end': 27, 'tag': 'blockchain'}, {'start': 28, 'end': 32, 'tag': 'btc'}, {'start': 33, 'end': 37, 'tag': 'eth'}, {'start': 38, 'end': 42, 'tag': 'eth'}] | [{'start': 0, 'end': 4, 'tag': 'Act'}, {'start': 7, 'end': 11, 'tag': 'jar'}] |
2022-04-24 | [{'start': 6, 'end': 7, 'tag': 'chain'}, {'start': 8, 'end': 3, 'tag': 'btc'}, {'start': 3, 'end': 7, 'tag': 'eth'}] | [{'start': 4, 'end': 8, 'tag': 'Act'}, {'start': 7, 'end': 9, 'tag': 'aapl'}] |
And my preferred result would be:
created_at | tweet_hashtag.tag | tweet_cashtag.tag |
---|---|---|
2022-07-23 | blockchain, btc, eth,eth | Act, jar |
2022-04-24 | chain, btc, eth | Act, aapl |
Thanks in advance!
I tried to flatten with this solution, but it does not work: How to apply json_normalize on entire pandas column
Upvotes: 1
Views: 63
Reputation: 4608
you can use:
def get_values(a,b):
x_values=[]
for i in range(0,len(a)):
x_values.append(a[i]['tag'])
y_values=[]
for j in range(0,len(b)):
y_values.append(b[j]['tag'])
return ','.join(x_values),','.join(y_values)
df[['tweet_hashtag','tweet_cashtag']]=df[['tweet_hashtag','tweet_cashtag']].apply(lambda x: get_values(x['tweet_hashtag'], x['tweet_cashtag']),axis=1)
or:
def get_hashtags(a):
x_values=[]
for i in range(0,len(a)):
x_values.append(a[i]['tag'])
return ','.join(x_values)
def get_cashtags(b):
y_values=[]
for i in range(0,len(b)):
y_values.append(b[i]['tag'])
return ','.join(y_values)
df['tweet_hashtag']=df['tweet_hashtag'].apply(lambda x: get_hashtags(x))
df['tweet_cashtag']=df['tweet_cashtag'].apply(lambda x: get_cashtags(x))
print(df)
'''
created_at tweet_hashtag tweet_cashtag
0 2022-07-23 blockchain,btc,eth,eth Act,jar
1 2022-04-24 chain,btc,eth Act,aapl
'''
Upvotes: 0