Square brackets are turned into question marks during text extraction

ranza · January 27, 2021, 5:26pm

Can you please make square brackets work in extracted texts?
Thanks!

Support-Team · January 27, 2021, 6:56pm

Hello

Would you please upload some screenshots or videos describing it in detail? That way I can help you better.

Kind Regards,
MarginNote-Edward
Support Team

ranza · January 27, 2021, 7:45pm

Sure, here’s one:

as you can see there are missing square brackets around 1,3,4. Instead we get “?”

Support-Team · January 28, 2021, 12:46pm

Hello

We first rule out whether the PDF itself is the text layer of the problem: you copy the text with square brackets in other PDF readers, and then paste it into any text box.

Check whether there are square brackets in the paste result.

Kind Regards,
MarginNote-Edward
Support Team

ranza · January 28, 2021, 10:06pm

@Support-Team when I copy with Preview there are no brackets, but also no “?” signs.

It seems like you have two different versions of “OCR”, one which gets text excerpts (probably not an OCR) and one which can extract text from Rect Excerpt (this one seems to work better).

I thought about telling you to just use the latter, but perhaps having different options is also a plus!

QSD_Support-Team · January 29, 2021, 8:13pm

Hi.

I also encounter such problems myself. Sometimes the ≥, { signs from my maths lecture notes also become this question mark.

This is mainly because the format of such symbols - they are not the usual “[” character, but something more like a special LaTeX string (just my personal idea), so it won’t be recognized correctly by Marginnote, but rather, appears as a “?”.

Right now, I can only suggest you to manually edit your cards. Double click the card you would like to modify and then you can edit anything within, including changing the question mark to “[” you want.

This is not a Marginnote issue: even if I open the PDF in PDF Expert and copy the selected text:

What I receive is still:

a m16 a m26 a

which is total messy.

We can only duplicate what is directly inside the PDF to your card, but we cannot decode it. It is what it is, so sometimes the result is not necessarily what you want.

Sorry for the inconvenience.

Very best
QSD - Support Team

ranza · January 29, 2021, 9:41pm

@QSD_Support-Team I’ve made a mistake previously stating that there were two mechanisms. Actually there’s only one which uses the pre-extracted text from the pdf itself.

An active OCR like tessaract would perhaps solve this issue.