Fixing hyphenated text?

When creating note cards using the Text Excerpt function, often the original quotation contains words broken by hyphens at the end of a line. MarginNote reproduces this, which means in effect that a single word gets broken in two. This means I can’t easily search the mindmap for keywords.

For example, if the original text has the word “pre- determined”, this also appears in the note and I can’t find it by searching for “predetermined”. Of course I can go through and delete all the hyphens by hand, but this is sort of tedious, especially when the spelling checker already “knows” these words are broken.

Is there some setting that can tell MarginNote to automatically fix up these broken words?

I investigated this a little bit. OS X has a text replacement feature in “System Preferences > Keyboard > Text”, but it refuses to accept "- ". Idk why, but that’s how it is. There is also a “smart dashes” feature but that is for changing double dashes to em-dashes. Upshot: there is seemingly no built-in OS X text service for cleaning up this very common issue.

I hacked up an OSX Automator script that runs sed ‘s/- //g’ as a shell script, but this still means selecting each quotation that I create with MarginNote, and then running the script on it.

Since MarginNote is creating the note card by copying the selected text from the PDF, and since the selected text will be on separate lines, IMHO the best way to attack this would be with a new option inside MarginNote that checks for "- " at the end of a line, and removes it. This feature could be enabled/disabled via a Preference setting, and would (for me, at least) greatly improve the workflow for creating searchable notecards.

I don’t have the immediate information at hand. I would think however that a text service can be created to run the sed script on the selection.

Also, maybe the “- “ sequence needs to be escaped to be recognized?


JJW

I googled for information about escaping characters in “System Preferences > Keyboard > Text” text replacements, but I can’t find anything. RegEx doesn’t work. If the “With” field is empty (i.e., replace something with nothing), the rule just gets ditched.

Also, I’ve noticed that the “Replace” field cannot include a space. OSX will open a warning about this.

I even tried editing a plist file directly and then dragging it into the Text Replacements field, but it just gets ignored.

It would be nice to do this with OS X, but it seems their service design is just too limited.

EDIT: Upon further experimentation, the replacements in “System Preferences > Keyboard > Text” only seem to work when entering text with the keyboard, not when copying and pasting text. So, again, unless there is some deeper text service in OS X that we can fiddle with, enhancing MarginNote seems like the best option to me.

Would you know RegEx well enough to capture the one character before the hyphen + the hyphen. You could then replace that with the one character before the hyphen.


JJW

Do you mean a RegEx expression in “System Preferences > Keyboard > Text” text replacements?

I was unable to get that to work. Also, the text replacement only seems to be applied when typing from the keyboard.

AFAICT, the intervention needs to happen when MarginNote copies the lines of text selected in the PDF file, and then concatenates them to create the text for a notecard. I.e., the code inside MarginNote needs to be modified to strip the hyphen+space at the end of each line.

Bump.

Dear @Lanco_Support-Team, @marginnote, @Support-Team,

Do you have a fix for this?

For me, it is the single biggest issue with MarginNote. Almost every time I create a note card, it contains words broken up by hyphens and I have to fix them.

It’s a constant annoyance, but something that I believe could be fixed very easily.

Please advise

Edit: actually, a good solution might be a user-definable text substitution when a note card is created (i.e., replace all "- " with “”); if we could add several, this would be simple a way to clean up unwanted hyphens, and also to fix recurring OCR errors in PDFs (e.g., replace “1he” with “The”).

1 Like

Sorry,just take notice of this issue.

1 Like

Bump.

Dear @Lanco_Support-Team, @marginnote, @Support-Team,

Any thoughts about a fix for this?

Bumping again.

@Lanco_Support-Team, @marginnote, @Support-Team,

Any thoughts about a fix for this?

1 Like

Bump.

@Lanco_Support-Team, could you kindly tell us whether you intend to fix this problem, or not?

It seems advanced search with API could do this. We are looking for testers who got experiences with AppleScript. And help us check what the API could do or optimize. Please click the link to join: Extstester - Marginnote

Regards,
Lanco

Well, it depends how you define your API.

The problem is that once notes contain hyphens, it breaks the search functionality. I.e., if you search for “Keyword”, you’ll get no results for “Key- word”. As a work-around, you could do something like this in SQLite:

SELECT * FROM ZBOOKNOTE WHERE (replace(ZHIGHLIGHT_TEXT, '- ', ‘’) LIKE ‘%perception%’ ) AND replace(ZHIGHLIGHT_TEXT, '- ', ‘’) LIKE ‘%codes%’ AND ZBOOKMD5 IN (‘659de1cbc6d1db66d3fd3930d7e3890736462f59db52c4a8d022b36d9ce518da’);

But this seems inefficient, there could be ‘false positives’, and it would still require either a change to your application code and possibly an optional parameter in the API to selectively replace hyphens.

A more effective solution, I submit, would be what I suggested last August (see above): provide a way for users to easily get rid of hyphens when notes are first created.

Thanks for pointing it out. I‘ve let Kevin know you request. After confirmation with devloper, he will give you a response.

Regards,
Lanco