Intro

If you don’t have time for this but are still interested skip to the bottom.

HTTP Headers

Which one of these POST requests results in a PDF being downloaded to your filesystem?

Request number 1

POST /scholar_url?url=http%3A%2F%2Fwww.issmge.org%2Ffilemanager%2Farticle%2F509%2FTC217Workshop19ICSMGE%2FCAI_Korea_ICSMGE_Nov_29_2017.pdf&hl=en&sa=T&oi=gga&ct=gga&cd=26&d=16502395451115221649&ei=AFK9YvfLOoGvmAHdh4vgCw&ws=1920x961&at=Combination%20of%20vacuum%20preloading%20and%20lime%20treatment%20for%20improvement%20of%20dredged%20fill HTTP/2
Host: scholar.google.com
Content-Length: 0
Accept: */*
Origin: https://scholar.google.com
Referer: https://scholar.google.com/scholar?start=20&q=water+preloading&hl=en&as_sdt=0,5
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9

Request Number 2

POST /scholar_url?url=https%3A%2F%2Fwww.researchgate.net%2Fprofile%2FM-Vinay-Kumar%2Fpost%2FCan-anyone-suggest-me-case-study-of-ground-improvement-through-vacuum-consolidation%2Fattachment%2F5c4e868e3843b0544e62b27d%2FAS%253A719904953094144%25401548650124561%2Fdownload%2FGround_Improvement_Case_Histories_hANSBO.pdf%23page%3D112&hl=en&sa=T&oi=gga&ct=gga&cd=38&d=5223665999451473974&ei=a1O9Yq2MJYj4mQGmrILgCw&ws=1920x961&at=Application%20of%20the%20vacuum%20preloading%20method%20in%20soil%20improvement%20projects HTTP/2
Host: scholar.google.com
Content-Length: 0
Accept: */*
Origin: https://scholar.google.com
Referer: https://scholar.google.com/scholar?start=30&q=water+preloading&hl=en&as_sdt=0,5
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9

The answer is request number 2 and reason why is I don’t know.

It generates a few questions though.

  • Is Google Scholar responsible for the behaviour?
  • Are some of the HTTP headers controlling the behaviour?
  • Is Google Scholar a red herring and this testing should be performed locally?
  • Can I get the two different behaviours of view vs download to occur by loading a PDF file directly in my browser?

Request #1 results in the response of a GET request to load the PDF which opens in the browsers PDF viewer.

Request #2 results in the PDF being downloaded.

Request number 1 response

GET /filemanager/article/509/TC217Workshop19ICSMGE/CAI_Korea_ICSMGE_Nov_29_2017.pdf HTTP/1.1
Host: www.issmge.org
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Referer: https://scholar.google.com/
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Connection: close

Request number 2 response

POST /safebrowsing/clientreport/download?key=AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw HTTP/1.1
Host: sb-ssl.google.com
Content-Length: 1354
Content-Type: application/octet-stream
Accept-Encoding: gzip, deflate
Connection: close


https://www.researchgate.net/profile/M-Vinay-Kumar/post/Can-anyone-suggest-me-case-study-of-ground-improvement-through-vacuum-consolidation/attachment/5c4e868e3843b0544e62b27d/AS%3A719904953094144%401548650124561/download/Ground_Improvement_Case_Histories_hANSBO.pdf#page=112"

Maybe Google Scholar is doing something to the requests? No this is just Google Chromes way of checking if the file you are attempting to download is in their known hash set of files that are malicious.

Request number 3

I am going to directly request the PDF file in my browser instead of loading it through Google Scholar just so analysing the request is easier.

GET /profile/M-Vinay-Kumar/post/Can-anyone-suggest-me-case-study-of-ground-improvement-through-vacuum-consolidation/attachment/5c4e868e3843b0544e62b27d/AS:719904953094144@1548650124561/download/Ground_Improvement_Case_Histories_hANSBO.pdf HTTP/1.1
Host: www.researchgate.net
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Connection: close

Response number 3

content-disposition: attachment; filename="Ground_Improvement_Case_Histories_hANSBO.pdf"
content-encoding: identity
content-length: 25512129
content-type: application/pdf

View vs Download

The header difference between the PDF that was opened by the browsers PDF viewer and the one that was downloaded gives us the hint as to why there is a difference in behaviour.

View request

GET /filemanager/article/509/TC217Workshop19ICSMGE/CAI_Korea_ICSMGE_Nov_29_2017.pdf HTTP/2
Host: www.issmge.org
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9

View response

HTTP/2 200 OK
Content-Type: application/pdf
Content-Length: 2714170
Server: cloudflare

%PDF-1.5
%âãÏÓ

Download request

GET /profile/M-Vinay-Kumar/post/Can-anyone-suggest-me-case-study-of-ground-improvement-through-vacuum-consolidation/attachment/5c4e868e3843b0544e62b27d/AS:719904953094144@1548650124561/download/Ground_Improvement_Case_Histories_hANSBO.pdf HTTP/1.1
Host: www.researchgate.net
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Connection: close

Download response

content-disposition: attachment; filename="Ground_Improvement_Case_Histories_hANSBO.pdf"
content-encoding: identity
content-length: 25512129
content-type: application/pdf

By the power invested in me I generate a diff.

content-disposition: attachment; filename="Ground_Improvement_Case_Histories_hANSBO.pdf"
content-encoding: identity

Outro

The content-disposition header is what controls a file being downloaded to your filesystem or displayed inline (viewable) in your browser.

Remember you can save 5 minutes reading the docs by spending an hour of debugging. A small price to pay.

back to top