haneulkim
haneulkim

Reputation: 4928

efficient way of concat columns depending on condition

MRE:

dictionary = {'2018-10': 50, '2018-11': 76}
df = pd.DataFrame({
    "date":["2018-10", "2018-10", "2018-10", "2018-11","2018-11"]
})

that looks like (I have milions of rows and multiple rows):

      date
0   2018-10
1   2018-10
2   2018-10
3   2018-11
4   2018-11

depending on date, in the dictionary there is number associated to it. I want to concatenate that associated number into date column (using vectorization).

so my desired dataframe would look like:

         date
0   2018-10 (50)
1   2018-10 (50)
2   2018-10 (50)
3   2018-11 (76)
4   2018-11 (76)

my date column has datatype string.

Current solution: I could use apply lambda:

 df["date"] = df["date"].apply(lambda row:row + f" ({dictionary[row]})")

however I am wondering if there is any way to do it vectorized way since I have millions of rows and do not want to go row by row.

EDIT: Now I think of it I don't think there can be a vectorized way since depending on date I need to concat different numbers.

Upvotes: 0

Views: 119

Answers (2)

BallpointBen
BallpointBen

Reputation: 13750

pd.Series.map can take a dict as the mapping, and strings and string columns can be added, so it's actually as easy as

df['date'] = df['date'] + ' (' + df['date'].map(dictionary).astype(str) + ')'

Upvotes: 1

kpie
kpie

Reputation: 11100

So I'm not 100% that this is the fastest way to do things but it is fairly simple.

data = {'2018-10': 50, '2018-11': 76}
df = pd.DataFrame({
    "date":["2018-10", "2018-10", "2018-10", "2018-11","2018-11"]
})
df["data"] = df.date.apply(lambda x: data[x])

Which yields:

      date  data
0  2018-10    50
1  2018-10    50
2  2018-10    50
3  2018-11    76
4  2018-11    76

Alternatively to df.date.apply(lambda x: data[x]) you could use

df.apply(lambda x: data[x['date']],axis=1)

Which I believe would perform similarly but it's less readable imo.

Upvotes: 0

Related Questions