Moses
Moses

Reputation: 177

Add values to a column from the same column

I have a data frame like this:

Row  Author     Cit_Handle    Year    Title        Handle
 1  Carlos Hi                2017    how to be    ReP:55:er45
 2  Boris Sla                2018    what it it?  ReP:ef5:ag4g 
 3  Dante Ur                 2017    is it true?  ReP:f9gj:sfona9:2039
 4            ReP:fb9:d93    
 5  Jure Les                 2016    ¡it is true! ReP:odjva:ejewojaef:advon
 6  Mark Cas                 2018    How do       ReP:apnvb:qt42rwb:203
 7            ReP:gjh:59f     

I want paste each Cit_Handle value from the row above it until it find another Cit_Handle value or the column name, like this:

Row     Author     Cit_Handle    Year    Title        Handle
 1     Carlos Hi  ReP:fb9:d93    2017    how to be    ReP:55:er45
 2    Boris Sla  ReP:fb9:d93    2018    what it it?  ReP:ef5:ag4g 
 3    Dante Ur   ReP:fb9:d93    2017    is it true?  ReP:f9gj:sfona9:2039   
 4    Jure Les   ReP:gjh:59f    2016    ¡it is true! ReP:odjva:ejewojaef:advon
 5    Mark Cas   ReP:gjh:59f    2018    How do       ReP:apnvb:qt42rwb:203

If you want to see a sample of the real data you can see it here.

Any idea how can I do it?

Upvotes: 1

Views: 39

Answers (1)

Haleemur Ali
Haleemur Ali

Reputation: 28313

The output that you describe can be achieved with a backfill on Cit_Handle & subsequently removing rows where any of the other fields are empty.

the code on the line In [5]: does all the processing.

In [1]: import pandas as pd

In [2]: text ="""Author,Cit_Handle,Year,Title,Handle
   ...: Carlos Hi,,2017,how to be,ReP:55:er45
   ...: Boris Sla,,2018,what it it?,ReP:ef5:ag4g
   ...: Dante Ur,,2017,is it true?,ReP:f9gj:sfona9:2039
   ...: ,ReP:fb9:d93,,,
   ...: Jure Les,,2016,¡it is true!,ReP:odjva:ejewojaef:advon
   ...: Mark Cas,,2018,How do,ReP:apnvb:qt42rwb:203
   ...: ,ReP:gjh:59f,,,"""

In [3]: from io import StringIO

In [4]: df = pd.read_csv(StringIO(text),sep=',')

In [5]: df.fillna(method='bfill')[df.Author.notnull()]
Out[5]:
      Author   Cit_Handle    Year         Title                     Handle
0  Carlos Hi  ReP:fb9:d93  2017.0     how to be                ReP:55:er45
1  Boris Sla  ReP:fb9:d93  2018.0   what it it?               ReP:ef5:ag4g
2   Dante Ur  ReP:fb9:d93  2017.0   is it true?       ReP:f9gj:sfona9:2039
4   Jure Les  ReP:gjh:59f  2016.0  ¡it is true!  ReP:odjva:ejewojaef:advon
5   Mark Cas  ReP:gjh:59f  2018.0        How do      ReP:apnvb:qt42rwb:203

One tiny note: The int type in pandas can't contain NaNs, thus in this process the Year column is upcast to float.

Upvotes: 1

Related Questions