Best Practices for Search Indexing PDF Files in Magic Search

·
·

Hi team, a few questions about search indexing and PDF files:

  • We have a use case where we have an extensive amount of PDF files that we want to be searchable (and especially to have processed through Magic Search for AI summaries), is having them as Notes or Insights generally considered best practice?

  • I've tried running the same file through both formats, and it seems like the OCR keyword and AI summaries perform more as expected with Notes, but the overall file is given more weight as an Insight.

  • It seems to me that it would be better to have these files as Insights, but it may force us to rethink our approach if we're not getting consistent relevant hits in search results.

Any advice here? Much appreciated!

  • Avatar of Benjamin Humphrey
    Benjamin Humphrey
    ·
    ·

    Hey Jared Forney! Great question. Sergey works on our search and discover team so can hopefully weigh in.

  • Avatar of Jeremie Gluckman
    Jeremie Gluckman
    ·
    ·

    Thank you Benjamin Humphrey! I'll see if I can seek out advice on this on my end too. More soon, Jared!

  • Avatar of Jeremie Gluckman
    Jeremie Gluckman
    ·
    ·

    Hi Jared Forney I have an update here. The team recommends adding PDF files as insights.

  • Avatar of Jared Forney
    Jared Forney
    ·
    ·

    Thanks Jeremie Gluckman ! Are there plans to allow search/tagging of pdf content? That’s the main thing I think we’re missing here; just to allow for richer searches.

  • Avatar of Jeremie Gluckman
    Jeremie Gluckman
    ·
    ·

    You can currently highlight and tag PDF content in a note.

  • Avatar of Jeremie Gluckman
    Jeremie Gluckman
    ·
    ·

    I just applied a tag to this PDF report. Does that help with your use case?

  • Avatar of Jared Forney
    Jared Forney
    ·
    ·

    Jeremie Gluckman I think our case here is more because we can't really apply any additional tags or OCR search to PDFs when they're insights; it's only searching the content that's generated via AI summary (which is helpful but limits our ability for stakeholders to do deep searches). The other reason why we're exploring these two options is because our researchers typically upload final presentations (which are effectively research reports) as PDFs, and we want to ensure these above all else are easy to find.